 Research
 Open Access
 Published:
Old drug repositioning and new drug discovery through similarity learning from drugtarget joint feature spaces
BMC Bioinformatics volume 20, Article number: 605 (2019)
Abstract
Background
Detection of new drugtarget interactions by computational algorithms is of crucial value to both old drug repositioning and new drug discovery. Existing machinelearning methods rely only on experimentally validated drugtarget interactions (i.e., positive samples) for the predictions. Their performance is severely impeded by the lack of reliable negative samples.
Results
We propose a method to construct highlyreliable negative samples for drug target prediction by a pairwise drugtarget similarity measurement and OCSVM with a highrecall constraint. On one hand, we measure the pairwise similarity between every two drugtarget interactions by combining the chemical similarity between their drugs and the Gene Ontologybased similarity between their targets. Then we calculate the accumulative similarity with all known drugtarget interactions for each unobserved drugtarget interaction. On the other hand, we obtain the signed distance from OCSVM learned from the known interactions with high recall (≥0.95) for each unobserved drugtarget interaction. After normalizing all accumulative similarities and signed distances to the range [0,1], we compute the score for each unobserved drugtarget interaction via averaging its accumulative similarity and signed distance. Unobserved interactions with lower scores are preferentially served as reliable negative samples for the classification algorithms. The performance of the proposed method is evaluated on the interaction data between 1094 drugs and 1556 target proteins. Extensive comparison experiments using four classical classifiers and one domain predictive method demonstrate the superior performance of the proposed method. A better decision boundary has been learned from the constructed reliable negative samples.
Conclusions
Proper construction of highlyreliable negative samples can help the classification models learn a clear decision boundary which contributes to the performance improvement.
Background
Detection of drugtarget interactions plays a vital role in both old drug repositioning and new drug discovery. It helps to identify new targets for existing drugs or predict new drugs for known targets. Currently, only a small number of drugtarget interactions are validated via wetlab experiments. A large proportion of interactions remain to be investigated by computational algorithms due to the high monetary and time cost of wetlab experiments.
Some specially designed machinelearning methods have been proposed recently in this research domain to overcome the challenging issues. These methods can be classified into three major categories: similaritybased methods, feature vectorbased methods and other methods. The similaritybased methods are all guided by the “guiltbyassociation” assumption that similar targets tend to be targeted by similar drugs and vice versa [1]. Ding et al. [2] had a comprehensive review on similaritybased machine learning methods. Models including nearest neighbor [3], kernelized Bayesian matrix factorization [4], networkbased inference [5], bipartite local models [3], gaussian interaction profile [6], and pairwise kernel method (PKM) [7] are summarized briefly and computationally compared in their work. The comparison results show that PKM performed the best in terms of AUC (area under the receiver operating characteristic curve).
In the feature vectorbased methods, each drugtarget pair (DTP) is represented as a fixedlength feature vector. The feature vector is encoded by various types of properties of drugs and targets, such as drug chemical structures and target sequences. For example, using the method proposed by Yu et al. [8], each drug is represented as a 1080feature vector consisting of constitutional descriptors, topological descriptors, 2D correlations, molecular properties and etc. Likewise, each protein is transformed into a 1080dimension feature vector. Merging them together, a set of 2160 features is taken to describe the drugprotein pairs for the Random Forest predictor. Luo et al. [9] developed DTINet, a computational pipeline which integrates diverse drugrelated information from heterogeneous data sources. DTINet can learn well from low dimensional vector representations for accurate interpretation of the topological properties of nodes in the heterogeneous network. Then, DTINet makes predictions based on these representations via a vector space projection scheme.
Apart from detecting the drugtarget interactions using similarity information or feature vectorbased representation, researchers also attempted to use other information such as biomedical documents for detection. Zhu et al. [10] proposed a probabilistic model named MAM to mine druggene relations from literature. MAM is composed of a mixture of aspect models, each of which is designed for one type of cooccurrence data and its learning algorithm. Their experimental results show that the prediction performance is improved via combining different types of cooccurrence data. Although potential drugtarget interactions can be mined from the biomedical documents, they have significant drawbacks such as low data quality and incompetency for novel relations.
These existing machinelearning approaches use the experimentally validated DTPs as positive samples, and use all or a random subset of unobserved DTPs as negative samples for the training of the classification models [3, 4, 6, 7]. As suggested by Ding [2], such negative samples might include potential drugtarget interactions not yet known, and would unavoidably result in inaccurate predictive results. Because the current machinelearning methods are severely impended by the lack of reliable negative samples, we develop a method to identify highly reliable negative samples of DTPs to improve the prediction performance.
Based on the “guiltbyassociation” assumption that similar drugs tend to interact with similar targets, the existing methods have achieved remarkable performance. Thus it is also reasonable to select reliable negative samples based on its converse negative proposition, i.e., a drug dissimilar to all drugs known to interact with a target is less likely to bind the target and vice versa.
Oneclass Support Vector Machine (OCSVM) [11] has demonstrated its advantages for classification in the absence of positive or negative samples [12]. It learns a hypersphere from the training data, ensuring most training data are in the hypersphere. OCSVM requires oneclass data only, thus it is an ideal technique to identify reliable negatives (i.e., outliners) for drugtarget prediction where only positives are available.
In this work, we propose a method to construct highlyreliable negative samples for drug target prediction by a pairwise drugtarget similarity measurement and OCSVM with a highrecall constraint. On one hand, we measure the pairwise similarity between every two drugtarget interactions by combining the chemical similarity between their drugs and the Gene Ontologybased similarity between their targets. Then we calculate the accumulative similarity with all known drugtarget interactions for every unobserved drugtarget interaction. On the other hand, we obtain the signed distance using OCSVM learned from the known interactions with high recall (≥0.95) for each unobserved drugtarget interaction. Unobserved DTPs with lower accumulative similarities or lower signed distances are less likely to be positives, thus of highprobability to be negatives. Consequently, we compute the score for each unobserved drugtarget interaction via averaging its accumulative similarity and signed distance after normalizing all accumulative similarities and signed distances to the range [0,1]. Unobserved interactions with lower scores are preferentially served as reliable negative samples for the classification algorithms. The specific negative number is determined by the negative sample ratio which will be discussed in the experiment section.
In the performance evaluation, we investigated impact of the ratio levels of negative samples on the prediction. We also demonstrated that the performance improvement brought by the reliable negative samples can be achieved for four different classical classifiers and for a domain specially designed prediction model (the pairwise kernel method PKM). Extensive experiments further show that the performances of all models have been improved significantly owing to the use of reliable negative samples.
Methods
Prediction framework
The prediction framework is illustrated in Fig. 1. It consists of three main components: credible negative sample generation, data representation, and drugtarget interaction prediction. First, unobserved DTPs are ranked in ascending order of their scores computed by the pairwise similarity and OCSVM. A corresponding number of them are sequentially selected to construct a reliable negative sample set. Then drugs and targets are represented as 5682dimensional and 4198dimensional vectors respectively according to their properties. Drugtarget vectors can be obtained by appending the target vector to the drug vector together. Following that, PCA (principal component analysis) is performed to reduce the dimension of raw drugtarget vectors. Finally, truncate drugtarget vectors with their labels are used to train the classifier for subsequent predictions.
Credible negative sample generation
It can be observed from Fig. 2 that a great number of targets only interact with one drug. It is indicative that there are abundant unobserved DTPs. Among these unobserved DTPs, some should be true interactions (positive samples) which are yet unobserved. Therefore, treating these unobserved DTPs all as negative samples by the traditional methods is unreasonable which may cause more false classifications [13]. A method to construct a reliable negative sample set becomes vital to achieve precise predictions.
Most existing machinelearning approaches developed for drugtarget interaction prediction are based on the assumption that similar drugs tend to bind similar targets and vice versa. Consequently, it is reasonable to select reliable negative samples based on its converse negative proposition that drugs dissimilar to all drugs known to bind a target are less likely to interact with the target and vice versa.
In this work, we propose to combine the converse negative proposition of the guiltbyassociation methods and the power of OCSVM to construct reliable negative samples. On one hand, we infer the probabilities of unobserved DTPs to be negatives by a pairwise drugtarget similarity measurement. To be specific, we first measure the similarities between drugs according to their chemical structures. Each drug is represented as a 1024dimensional fingerprint using the opensource tool CDK (Chemistry Development Kit) [14]. Formally for a drug d, it is represented as \(f^{d}\left (f_{i}^{d}\in \{0,1\}, i\in \{1,2,...,1024\}\right)\). Then the chemical similarity between two drugs, say drug d_{i} and drug d_{j}, is calculated by their Tanimoto score:
where ∧ and ∨ are bitwise “and” and “or” operators respectively; \(f_{l}^{i}\) and \(f_{l}^{j}\) are the l^{th} bit of fingerprints of drug d_{i} and drug d_{j} respectively. We also measure the similarity between two target proteins as the overlapping ratio of their related GO terms. Suppose GO^{i} and GO^{j} are the GO term sets for the target protein t_{i} and t_{j} respectively, the similarity score between t_{i} and t_{j} is defined as:
where ∩ and ∪ are “intersection” and “union” operators respectively. Then, we measure the pairwise similarity between two DTPs by combining the drug similarity and the target protein similarity. The pairwise similarity between the drugtarget pair p_{i}(d_{i}−t_{i}) and p_{j}(d_{j}−t_{j}) is given by:
Following that, we calculate the accumulative pairwise similarity with all the validated DTPs for each unobserved DTP. For an unobserved DTP p_{i}, its accumulative pairwise similarity is measured by:
where n is the total number of validated DTPs.
On the other hand, we infer the probabilities by OCSVM. Specifically, we use signed distances which denote the distances between the unobserved DTPs and the calculated OCSVM separating hyperplane to measure their probabilities (obtained using sklearn.svm.OneClassSVM.decision_function of the Python scikitlearn package). We feed OCSVM with all known DTPs and optimize its parameters via 5fold crossvalidation. A high recall constraint (≥0.95) is required to ensure that the majority of true DTPs are correctly predicted. With the optimized parameter settings (nu: 0.1, gamma: 0.05, recall=0.96), we obtained the signed distances for all unobserved DTPs.
After we get the accumulative pairwise similarities and signed distances for all DTPs, we normalize them to the range [0,1] via the formula 5 and 6 respectively.
where \({Sim}_{acc}^{max}\) and \({Sim}_{acc}^{min}\) are the maximum and minimum value of all accumulative pairwise similarities respectively, NSim_{acc}(p_{i}) and Sim_{acc}(p_{i}) are the normalized and raw accumulative pairwise similarity for DTP p_{i}.
where Dis_{max} and Dis_{min} are the maximum and minimum value of all signed distances, NDis(p_{i}) and Dis(p_{i}) are the normalized and raw signed distance for DTP p_{i}.
The “guiltbyassociation” methods assume that similar drugs are more likely to interact with similar targets [2]. Consequently, unobserved DTPs with lower accumulative similarities are less likely to be true positives and of highprobability to be true negatives. OCSVM predicts DTPs with higher normalized signed distances as positives, thus unobserved DTPs with lower normalized signed distances are more likely to be true negatives. Consequently, it’s reasonable to combine the above two factors as a single probability score as follows: Score(p_{i})=(NSim_{acc}(p_{i})+NDis(p_{i}))/2. Finally, we rank all unobserved DTPs in ascending order of their probability scores (screen negative list, see Additional file 1), and those with lowest scores are taken to form the set of negative samples. The specific number is determined by the negative sample ratio which is discussed in the experiment section.
Data representation via vectors
To perform the machinelearning task, we represent drugs and target proteins as vectors according to their properties. Specifically, each drug is represented as a 5682dimensional binary vector using its chemical substructures (881), sideeffects (4063) and substituents (738). The elements of the drug vector encode for the presence or absence of each property (i.e., chemical substructures/sideeffects/substituents) by 1 or 0. The drug chemical substructures correspond to the 881 chemical substructures defined in PubChem [15]. The sideeffects and substituents are 4063 unique sideeffects from SIDER [16] and 738 unique substituents from Drugbank [17, 18] respectively. Likewise, each protein is represented as a 4198dimensional binary vector where each bit denotes the presence or absence of the unique GO term by 1 or 0. Finally, we obtain the vector of any drugtarget pair by appending the target vector to the drug vector.
Prediction of drugtarget interactions
The dimension of each DTP vector is 9880 (5682 + 4981) and there are 1,702,264 (1,094*1,556) possible DTPs between 1094 drugs and 1556 targets used for experiments. Thus the size of the classification input could be around the order of magnitude of billion (9,880*1,702,264). Such high dimensionality will inevitably incur a huge time and computational cost. In this study, we employ PCA to map raw vectors of DTPs into lowerdimension space to speed up the prediction process. To be specific, we fit PCA with all training DTP vectors first. Then we transform both the training and test DTP vectors into lowerdimensional vectors. The PCN (principle component number) is set as 225 and the specific determining process is described in Additional file 2: Figure S2.
We label all positive samples (i.e., experimentally validated DTPs) as +1 and the reliable negative samples as 1. The compressed vectors of DTPs together with their labels are used to train a binary classifier (e.g., Random Forest) for subsequent prediction. The prediction performance is evaluated via 5fold cross validation: (1) samples in the gold standard are split into 5 roughly equalsized subsets; (2) each subset is taken in turn as the test set, and the remaining subsets are used as training set; (3) all results over the 5fold validation are used for evaluation. Evaluation metrics widely used in binary classification including AUC, precision, recall, and F1Score are employed to demonstrate the prediction performance.
Results and discussions
In this section, we first describe the details of the data used in this work. Then we investigate impacts of the ratio levels of negative samples to the positive samples on the prediction performance. Using the best setting for the negative sample ratio, we then evaluate the performance improvement brought by the reliable negative samples by four classical classifiers. Finally, we further demonstrate the superior performance of the proposed method using PKM, a stateoftheart predictive method proved to be the most powerful in Ding’s review [2].
Data resources
We use the benchmark dataset collected by Zheng et al. [19] for experiments. It consists of 1094 drugs and 1556 targets. Drug properties including chemical structures and substituent are extracted from DrugBank [17, 18], a comprehensive drug database. All sideeffects are downloaded from SIDER [16] and the GO terms of target proteins are retrieved from the EMBLEBI website [20]. The statistical details of the data sources are summarized in Table 1. The distribution of the experimentally validated drugtarget interaction pairs is illustrated in Fig. 2. Information of all researched drugs, targets and validated DTPs is available in Additional file 3. All the above data and the source codes are included in Additional file 4.
Impacts of negative sample ratio levels on the prediction performance
There are 11,819 experimentally validated interactions between the 1094 drugs and the 1556 target proteins used in this work. The remaining 1,690,445 (1094*1556  11,819) DTPs are unobserved DTPs, about 143 times the number of validated DTPs. It is impossible to take all unobserved DTPs as negative samples for prediction. In this work, we take all validated DTPs as positive samples. Similar to [21], we investigate how the performance varies when the ratio of negative samples (ratio relative to positive samples) increases from 0.5 to 5. The negative samples are sequentially extracted from the screen negative list (see “Credible negative sample generation” section). Four classical classifiers including Adaboost, LR (logistic regression), KNN (knearest neighbor) and RF (random forest) are employed for the training and prediction. All the classifiers are implemented using Python 2.7.13 (sklearn) with the default settings. The F1Scores achieved by these classifiers under different levels of negative sample ratios are depicted in Fig. 3. It can be seen that the prediction performance of all the four classifiers increases a bit with the negative sample ratio 0.5. Then the performance begins to decrease when the negative sample ratio is larger than 1. The same trend can be observed from the AUC shown in Additional file 2: Figure S1. The training time increases with the increasing number of training samples. Considering the prediction performance and time cost, we take 1 as the optimized negative sample ratio in the following experiments.
Much better performance than using accumulative pairwise similarity alone and randomly generated negative samples
To demonstrate the advantage of incorporating signed distances to accumulative pairwise similarities and the prediction performance improvement brought by the constructed reliable negative samples (Reliable, negatives sequentially extracted from the screen negative list), we compare them with negative samples inferred by accumulative pairwise similarities alone (Pairwise) and randomly generated negative samples (Random). The negative samples inferred by the accumulative pairwise similarities are negatives sequentially extracted from DTPs in ascending order of their accumulative pairwise similarities. The randomly generated negative samples are obtained by randomly sampling DTPs which are not in the positive samples. Apart from the negative samples, other settings are the same (NSR = 1). To avoid bias, Random is repeated 5 times and the average results are used for the final evaluation. The bar chart of the results are presented in Fig. 4 and the specific values are listed in Additional file 3: Table S1. It can be observed from Fig. 4 that all the four classifiers achieve significantly better performance on all the evaluation indices when using the reliable negative samples (colored yellow) than using negative samples inferred by the accumulative pairwise similarities (colored orange) and randomly generated negative samples (colored green). For example, Adaboost, KNN, Logistic Regression, and Random Forest’s F1Score improvements are 24.38%, 22.75%, 14.14% and 19.92% over Random respectively, and 14.6%, 22.35%, 7.82% and 6.89% over Pairwise respectively. Besides, with Pairwise, Adaboost, KNN, LR and RF achieves 8.5%, 0.3%, 5.86% and 12.19% F1Score improvements over Random respectively. The above results show that the proposed pairwise similarity and its combination with the OCSVM signed distances contribute the performance improvement. Better classification boundary has been successfully learned from the constructed reliable negative samples by these classifiers.
Significant improvement for the domain predictive method
To further confirm the superior prediction performance when using the reliable negative samples, we investigated whether the existing domain predictive methods can achieve better performance. Specifically, we conducted experiments for the domain prediction method PKM (pairwise kernel method), which was suggested to be the most powerful prediction method in Ding’s review [2]. PKM first computes the pairwise similarity between two drugtarget pairs as follows:
where sim_{d} and sim_{t} are the drug similarity and target similarity (drug chemical structure similarity and target GO similarity used in this work) respectively. Then PKM trains an SVM (support vector machine) with the pairwise similarity kernel to predict scores of arbitrary drugtarget pairs. As mentioned in the “Impacts of negative sample ratio levels on the prediction performance” section, we set the negative sample ratio as 1. We compare the prediction performance of PKM when it used the reliable negative samples or when it used randomly selected negative samples (the default setting of PKM). The results are shown in Fig. 5. We can see that the performance of PKM is improved on all the indices when using the reliable negative samples. In detail, the improvements on precision, recall, F1Score and AUC are significant at 22.1%, 40.3%, 33.4% and, 11.4% respectively. The result reveals that training with the reliable negative samples, PKM learned a better decision boundary indeed for a significant overall improvement on prediction performance.
Conclusions
In this work, we propose to improve drugtarget predictions by constructing highly reliable negative samples by a pairwise drugtarget similarity measurement and OCSVM (oneclass support vector machine) with a highrecall constraint. On one hand, we measure the pairwise similarity between every two drugtarget interactions by combining the chemical similarity between their drugs and the Gene Ontologybased similarity between their targets. Then we calculate the accumulative similarity with all known drugtarget interactions for each unobserved drugtarget interaction. On the other hand, we obtain the signed distance using OCSVM learned from the known interactions with high recall (≥0.95) for each unobserved drugtarget interaction. After normalizing all accumulative similarities and signed distances to the range [0,1], we compute the score for each unobserved drugtarget interaction via averaging its accumulative similarity and signed distance. Unobserved interactions with lower scores are preferentially served as reliable negative samples for the classification algorithms. In the experiment, we investigated how the negative sample ratio level impacts on the prediction performance first. Then we evaluated the performance improvement brought by the constructed negative samples comparing with the case of training on the random negative samples. The comparison experiments were conducted for four classical classifiers and a domain specifically designed predictive model PKM. The extensive experiments demonstrate that the prediction performance has been improved significantly owing to the constructed highlyreliable negative samples.
The proposed method is valuable to both old drug repositioning and new drug discovery. It can guide and speed up the laborious, expensive and tedious experimental identification of drugtarget interactions [22]. In this work, drug chemical structures and protein related GO terms are employed to measure the similarity between drugs and target proteins respectively. We note that more information about drugs (e.g., sideeffects, substituents) and target proteins (e.g., protein sequences) can be utilized to measure more of their similarities. This is an interesting problem which will be studied in our future work.
Availability of data and materials
The data used in this study all are available in the Additional files.
Abbreviations
 AUC:

Area under the receiver operating characteristic curve
 CDK:

Chemistry development kit
 DTP:

Drug target pair
 KNN:

Knearest neighbor
 LR:

Logistic regression), OCSVM: Oneclass support vector machine
 PCN:

Principle component number
 PKM:

Pairwise kernel method
 RF:

Random forest
 SVM:

Support vector machine
References
 1
Chen R, Liu X, Jin S, Lin J, Liu J. Machine learning for drugtarget interaction prediction. Molecules. 2018; 23(9):2208.
 2
Ding H, Takigawa I, Mamitsuka H, Zhu S. Similaritybased machine learning methods for predicting drug–target interactions: a brief review. Brief Bioinforma. 2013; 15(5):734–47.
 3
Bleakley K, Yamanishi Y. Supervised prediction of drug–target interactions using bipartite local models. Bioinformatics. 2009; 25(18):2397–403.
 4
Gönen M. Predicting drug–target interactions from chemical and genomic kernels using bayesian matrix factorization. Bioinformatics. 2012; 28(18):2304–10.
 5
Cheng F, Liu C, Jiang J, Lu W, Li W, Liu G, Zhou W, Huang J, Tang Y. Prediction of drugtarget interactions and drug repositioning via networkbased inference. PLoS Comput Biol. 2012; 8(5):1002503.
 6
van Laarhoven T, Nabuurs SB, Marchiori E. Gaussian interaction profile kernels for predicting drug–target interaction. Bioinformatics. 2011; 27(21):3036–43.
 7
Jacob L, Vert JP. Proteinligand interaction prediction: an improved chemogenomics approach. Bioinformatics. 2008; 24(19):2149–56.
 8
Yu H, Chen J, Xu X, Li Y, Zhao H, Fang Y, Li X, Zhou W, Wang W, Wang Y. A systematic prediction of multiple drugtarget interactions from chemical, genomic, and pharmacological data. PloS One. 2012; 7(5):37608.
 9
Luo Y, Zhao X, Zhou J, Yang J, Zhang Y, Kuang W, Peng J, Chen L, Zeng J. A network integration approach for drugtarget interaction prediction and computational drug repositioning from heterogeneous information. Nature Commun. 2017; 8(1):573.
 10
Zhu S, Okuno Y, Tsujimoto G, Mamitsuka H. A probabilistic model for mining implicit ’chemical compound–gene’relations from literature. Bioinformatics. 2005; 21(suppl_2):245–51.
 11
Xiao Y, Wang H, Xu W. Parameter selection of gaussian kernel for oneclass svm. IEEE Trans Cybernet. 2014; 45(5):941–53.
 12
Khan SS, Madden MG. A survey of recent trends in one class classification. In: Irish Conference on Artificial Intelligence and Cognitive Science. Dublin: Springer: 2009. p. 188–97.
 13
Lan W, Wang J, Li M, Liu J, Li Y, Wu FX, Pan Y. Predicting drug–target interaction using positiveunlabeled learning. Neurocomputing. 2016; 206:50–7.
 14
Willighagen EL, Mayfield JW, Alvarsson J, Berg A, Carlsson L, Jeliazkova N, Kuhn S, Pluskal T, RojasChertó M, Spjuth O, et al. The chemistry development kit (cdk) v2. 0: atom typing, depiction, molecular formulas, and substructure searching. Journal of cheminformatics. 2017; 9(1):33.
 15
Chen B, Wild D, Guha R. Pubchem as a source of polypharmacology. J Chem Inf Model. 2009; 49(9):2044–55.
 16
Kuhn M, Letunic I, Jensen LJ, Bork P. The sider database of drugs and side effects. Nucleic acids research. 2015; 44(Issue D1):1075.
 17
Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, Maciejewski A, Arndt D, Wilson M, Neveu V, et al. Drugbank 4.0: shedding new light on drug metabolism. Nucleic Acids Res. 2013; 42(D1):1091–7.
 18
Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z, et al. Drugbank 5.0: a major update to the drugbank database for 2018. Nucleic Acids Res. 2017; 46(D1):1074–82.
 19
Zheng Y, Peng H, Zhang X, Gao X, Li J. Predicting drug targets from heterogeneous spaces using anchor graph hashing and ensemble learning. In: 2018 International Joint Conference on Neural Networks (IJCNN). Rio de Janeiro: IEEE: 2018. p. 1–7.
 20
Consortium GO. The gene ontology (go) database and informatics resource. Nucleic Acids Res. 2004; 32(suppl 1):258–61.
 21
Tabei Y, Pauwels E, Stoven V, Takemoto K, Yamanishi Y. Identification of chemogenomic features from drug–target interaction networks using interpretable classifiers. Bioinformatics. 2012; 28(18):487–94.
 22
Haggarty SJ, Koeller KM, Wong JC, Butcher RA, Schreiber SL. Multidimensional chemical genetic analysis of diversityoriented synthesisderived deacetylase inhibitors using cellbased assays. Chem Biol. 2003; 10(5):383–96.
Acknowledgements
None.
About this supplement
This article has been published as part of BMC Bioinformatics Volume 20 Supplement 23, 2019: Proceedings of the Joint International GIW &; ABACBS2019 Conference: bioinformatics. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume20supplement23.
Funding
This study was supported by a grant from the China Scholarship Council (Grant Number: 201503170244). Publication of this supplement was funded by Faculty of Engineering and Information Technology, University of Technology Sydney.
Author information
Affiliations
Contributions
YZ and JL conceived the work. YZ and HP developed the method. YZ implemented the algorithms. JL and XG supervised the study. YZ, XZ and ZZ wrote the manuscript. All authors revised and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
Jinyan Li is a member of the editorial board (Associate Editor) of BMC Bioinformatics.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Additional file 1
Unobserved drugtarget pairs with their inference scores (ranked in the ascending order).
Additional file 2
The supplementary figures for this work.
∙ Figure S1: The AUC scores of four classifiers on reliable negative samples with different negative sample ratio levels.
∙ Figure S2: F1scores of the proposed method with different PCNs (principle component numbers). The xaxis is the PCA component number and the yaxis is the F1score.
Additional file 3
∙ Table S1: AUC/Precision/recall/F1Score values of four classical classifiers when using reliable, pairwise or randomly generated negative samples.
∙ Table S2: 1094 drugs researched in this work.
∙ Table S3: 1556 targets researched in this work.
∙ Table S4: 11,819 validated drugtarget interactions.
Additional file 4
Supplementary codes and data. The Python codes of the proposed method and the source data.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Zheng, Y., Peng, H., Zhang, X. et al. Old drug repositioning and new drug discovery through similarity learning from drugtarget joint feature spaces. BMC Bioinformatics 20, 605 (2019). https://doi.org/10.1186/s128590193238y
Received:
Accepted:
Published:
Keywords
 Drug target prediction
 Reliable negative samples
 Pairwise similarity