Deciding when to stop: efficient experimentation to learn to predict drugtarget interactions
 Maja TemerinacOtt^{1},
 Armaghan W Naik^{2} and
 Robert F Murphy^{1, 2, 3}Email author
Received: 3 November 2014
Accepted: 26 June 2015
Published: 9 July 2015
Abstract
Background
Active learning is a powerful tool for guiding an experimentation process. Instead of doing all possible experiments in a given domain, active learning can be used to pick the experiments that will add the most knowledge to the current model. Especially, for drug discovery and development, active learning has been shown to reduce the number of experiments needed to obtain highconfidence predictions. However, in practice, it is crucial to have a method to evaluate the quality of the current predictions and decide when to stop the experimentation process. Only by applying reliable stopping criteria to active learning can time and costs in the experimental process actually be saved.
Results
We compute active learning traces on simulated drugtarget matrices in order to determine a regression model for the accuracy of the active learner. By analyzing the performance of the regression model on simulated data, we design stopping criteria for previously unseen experimental matrices. We demonstrate on four previously characterized drug effect data sets that applying the stopping criteria can result in upto 40 % savings of the total experiments for highly accurate predictions.
Conclusions
We show that active learning accuracy can be predicted using simulated data and results in substantial savings in the number of experiments required to make accurate drugtarget predictions.
Keywords
Background
A critical step in developing new therapeutics is frequently to conduct large scale searches for potential drugs that can affect a desired target. Recently, it has become clear that finding successful drugs also requires searching for the absence of undesired effects on other targets. This need can often not be met by exhaustive experimentation due to cost, but selective experimentation driven by machine learning (a process referred to as active learning) may provide an alternative [1]. The heart of active learning is having good predictive models to guide experimentation. Recent studies show that drugtarget prediction algorithms can speedup the discovery of new drugs (e.g., [2–5]).
Current drugtarget prediction methods are coarse grained over at most a handful of ’campaigns’. In these, a classifier is trained with relatively large amounts of training data resulting from exhaustive screening, and then verified on a small test set. These data are generally identified manually, and limited to human ’expert’ knowledge. This process is generally only performed once, or at most a handful of times due to the expense of exhaustive screening over many compounds. This procedure limits the generalization capability of the model and does not allow for an optimal exploration of the drugtarget interaction space. Alternatively, active learning methods can be used to iteratively build a model of drugtarget interactions. Instead of relying on large training data sets, the active learning procedure enlarges the training set stepwise, guided by the predictions on small, automaticallyselected test sets. Thus time and experimental costs are spent on improving the general model rather than for the verification of a small specific model that does not account for the large space of chemical compounds and targets. The general model has the potential to predict sideeffects early on in the drug design process, since a larger number of drugs and targets are considered in the drugtarget prediction matrix. A critical point when using active learning to guide experimentation is to decide when to stop, since the goal is to perform as few experiments as possible in order to have the best model. The best stopping time is reached when adding new experiments to the training set will not appreciably improve the accuracy on the test set. The difficulty, of course, is that calculating the true accuracy of the model requires all of the data. Therefore, reliable methods for predicting the accuracy of the current model during an active learning cycle are desired. This would allow experimentation to stop when a predefined confidence on the output of the model is reached.
A natural question is how such an active learning strategy is related to classical statistical approaches [6, 7] to design experiments with incomplete coverage of factors to estimate response surfaces. In the case of a large number of parameters in the model (multiple drugs and multiple targets), these methods are very slow and adapting them to model a large number of parameters is challenging [8]. Furthermore, the most critical difference between the active learning strategy such as the one proposed in our work and the classical statistical setup of design of experiments is that they provide guarantees on the concentration of parameters conditional on having observed sufficiently many experiments with particular arrangements, but not guarantees on the optimality of the learned model up to that point. Our goal is to learn the most accurate model possible regardless of the number of experiments performed.
Previous work in drugtarget prediction has generally addressed active learning methods or drugtarget prediction methods, but rarely both. For example, active learning has been used to identify active compounds from a large pool of compounds targeting a single molecule [9]. Active learning has also been applied in the context of cancer research [10]. Several methods for drugtarget prediction without active learning have been proposed recently [11–17] and remain an active area of research. The focus of this work is not to promote a particular drugtarget prediction method, but to show using matrix factorization as an example of how drugtarget prediction can be combined with active learning and lead to reductions of experimentation cost. Initial results on applying active learning for drugtarget prediction on multiple drugs and multiple targets simultaneously have been reported [18, 19], with and without requiring prior knowledge of drug or target similarities. Dramatic benefits of active learning on a large dataset from PubChem using drug and target similarities have been reported, but without consideration of when to stop experimentation [19]. A method for predicting the accuracy of models learning by active learning for the purpose of developing a stopping rule has been described, but it was not applied to the particular problem of drugtarget prediction [18].
Several stopping rules for active learning have been considered in the past [20–22], however there has been little analysis of which performs the best in general. Four simple stopping criteria based on confidence estimation over the unlabeled data pool and the label consistency between neighboring training rounds of active learning have been presented [22]. Instead of using a single criterion to stop, combining different stopping criteria in a feature vector describing the active learning trajectory has been proposed [18]. The features of trajectories on simulated data are used to train a regression function in order to predict the accuracy of active learning algorithms on unseen simulated data. Here we will follow this approach and adopt it to the binary drugtarget prediction case.
The major goals of our active learning system are: (1) We want to have a fast and reliable method to elucidate drugtarget interactions. (2) Previous knowledge on similarities between drugs and similarities between targets should be included in the model, so that predictions for new drugs or targets (for which no experiments are available) are possible. (3) The number of experiments required to make confident predictions should be systematically reduced. (4) An efficient stopping rule for ending the active learning process should be designed.
Previously, kernelbased matrix factorization [23] has been shown to provide good models of drugtarget interactions [24]. In the kernelized Bayesian matrix factorization (KBMF) algorithm [24, 25], the drugtarget interaction matrix is factorized by projecting the drugs and the targets into a common subspace, where the projected drug matrix and the projected target matrix can be multiplied in order to produce a prediction for the drugtarget interaction matrix. The entries of the prediction matrix are modeled using truncated normal distributions. The projected drug matrix and target matrix are based on two different kernels: a drug specific kernel and a target specific kernel. A kernel encodes the similarity between the drug and the target features. Thus prior information can be easily inserted into the model. Furthermore, the knowledge of the full interaction matrix is not needed in order to make predictions for new drugs, which is not the case for previous methods (i.e. [12]).
The main contributions of this work are: (i) We use KBMF to construct a powerful and practical active learning strategy for analyzing drugtarget interactions. (ii) We extend previous work [18] on estimating the accuracy of active learning predictions to the KBMF case and show how it can be used to construct a stopping rule for experimentation. (iii) We provide a proof of concept through evaluation of the method on four data sets previously used for modeling of drugtarget interactions [26]. (iv) We show the superiority of the proposed active learning approach compared to random choice of an equivalent number of experiments.
Methods
Active learning framework
Data representation
We use interaction matrices Y∈{−1,1}^{ N×M } to represent drugtarget interactions. We assume that the outcome of the experiment determines the ground truth label \(l \in \mathcal L = \{1,1\}\) for an interaction matrix entry. \(N \in \mathbb {N}\) is the number of drugs, \(M \in \mathbb {N}\) is the number of targets. Knowledge of the interaction between a drug d∈{1,2,...,N} and a target t∈{1,2,...,M} is ternary encoded in the experimental matrix X: +1 for an interaction, −1 for lack of interaction, and 0 to denote experiments which have not yet been performed. Hereby, the set of remaining experiments (unlabeled data) will be denoted by \(\mathcal {X} = \{x = (d,t)\mathbf {X}(x) = 0 \}\). Therefore, we consider a semisupervised binary labeling problem where the sign of the label indicates the interaction status between a drug and a target.
Kernelized Bayesian matrix factorization (KBMF)
We use drug and target kernel matrices respectively to represent the pairwise similarity of drugs to one another and the pairwise similarity of targets to one another. These similarities are values between zero and one, where zero indicates no similarity and one indicates the highest similarity. All the values on the diagonal of the kernel are therefore one. In order to compute the similarities for the target kernel matrix we use the normalized SmithWaterman score [27] which uses the sequence information of two proteins to compute similarities. Other possibilities to compute the similarity between proteins are to first compute features using programs like ProtParam [28] or Prosite [29] as employed previously [19] and then compute the similarity between the features using a distance metric. For computing the similarity between drugs we used SIMCOMP [30], a program which uses graphs to represent drugs and computes the similarity between two drugs by searching the maximal common subgraph isomorphism. Other tools to compute the similarity between drugs are included in the OpenBabel package [31].
where \(\mathbf {A_{d}}\in \mathbb {R}^{N \times R}\) and \(\mathbf {A_{t}}\in \mathbb {R}^{M \times R}\) are subspace transformation matrices computed by the variational Bayes algorithm [24, 25] using the values of the experimental matrix X. The dimension R of the subspace is a free parameter; we used the value of 20 previously determined to be optimal for these datasets [25]. The entries of the kernel matrix K _{ d } and K _{ t } are a measure of the pairwise similarities between drugs and targets respectively. The similarity matrices provided by Yamanishi et al. [26] and the KBMF implementation of semisupervised classification provided by Goenen [25, 32] were used.
Note that it is not possible to factor the interaction matrix Y by multiplying the drug and target kernels directly, since they are matrices of differing dimension. Therefore transformation matrices A _{ d } and A _{ t } are needed which project the drug kernel and the target kernel into a common subspace. Since the product of the transformed kernels F should reflect the observed experiments as well as possible, the values of A _{ t } and A _{ d } are found such that they maximize the posterior probability of having observed the experimental matrix X along with some prior information on the distribution of the elements in the transformation matrices. Goenen [24, 25] used a graphical model to represent the relationships, and provided a detailed derivation of an efficient inference scheme using variational approximation. The KBMF algorithm is an iterative algorithm which converges usually after 200 iterations. The values of the kernels do not necessarily have to be in the range zero to one, since the scaling of the kernels is implicitly encoded in the transformation matrices.
Initialization and experiment selection
Our initialization strategy is to select a random column and one random experiment from each row of the experimental matrix X.
Uncertainty sampling
where \(\mathcal {L} = \{0,1\}\) is the set of possible labels and l is a label.
and P(l=−1x)=1−P(l=1x) for no interaction respectively.
Here we make use of the property of the KBMF method, that the magnitude of the predicted entry in F is an indicator for the confidence of the prediction.
Stopping rule
where u R o w s(.) and u C o l u m n s(.) compute the number of unique rows and unique columns of a matrix.
The uniqueness and responsiveness are values in the range [0,1] and characterize the interaction matrix. Responsiveness measures the percentage of interactions in the matrix. Uniqueness is a measure of independence of the rows and columns in the matrix. The higher the value for uniqueness is, the more difficult it is to make predictions.
These two measures have two purposes: (1) They are used to compute features for a timestep in our current active learning process. (2) They can be used to generate simulation data having similar properties to the measured experimental data.

f(1),f(2): average observed responsiveness across columns (respectively rows)

f(3),f(4): average predicted responsiveness across columns (respectively rows)

f(5): average difference in predictions from last prediction for current timepoint (t _{ i })

f(6): average difference in predictions from last prediction for previous timepoint (t _{ i−1})

f(7): fraction of predictions at t _{ i−1} observed as responsive (l=1) at t _{ i }

f(8),f(9),f(10): minimum, maximum and mean number of experiments that have been performed for any drug

f(11),f(12),f(13): minimum, maximum and mean number of experiments that have been performed for any target
where N _{ f } is the number of observations used, \(\tilde {p}=0.5\cdot p \cdot (p+1)\) and t≥0 is a tuning parameter. We use lasso regression [34] to learn the vector of response coefficients \(\mathbf {\beta } \in \mathbb {R}^{\tilde {p}}\).
To learn the accuracy predictor via simulation data, interaction matrices of size 50×50 were randomly sampled in the grid of uniqueness and responsiveness parameters 5 %,10 %,…,95 %. For each interaction matrix we derived ’perfect’ Gaussian similarity kernels K _{ d },K _{ t } by pairwise distances of the columnspace and rowspace, respectively. These were disrupted by forcing 0 %,5 %,10 % of the kernel entries to the value 1 and regularized to ensure positive semidefiniteness. Features computed from trajectories of the uncertainty sampling active learner on these data were collected; for eachtrajectory we also measured the accuracy of prediction against the ground truth. A linear model of these features against adjusted accuracies (accuracy above the fraction of experiments performed so far) was fitted by lasso regression [34]. The lasso regularization parameter was chosen by 11fold cross validation under squared loss, with holdout granularity at the level of trajectories. To make accuracy predictions from adjusted accuracy predictions, we added the fraction of experiments performed so far.
Results
For evaluation of our method, experiments were performed on four data sets extracted from the KEGG BRITE [35], BRENDA [36], SuperTarget [37] and DrugBank [38] databases, previously described by Yamanishi et al. [26, 39]. The data set consists of four drugtarget interaction matrices: Nuclear Receptor, GPCR, Ion Channel and Enzyme. To do this evaluation, we considered what a fair comparison would be with how a multiple drug, multiple target screening process might be carried out using current practice. Most decisions about experimental choice are currently made by investigators based on their prior knowledge. However, we are not aware of any study where human investigators have been asked to choose experiments in a multiple drug, multiple target scenario. This would be difficult since investigators would typically not have sufficient knowledge of hundreds of targets to carry this out; most investigators have expertise for a limited number of targets. The closest strategy used in practice would be to choose drugs for each target independently, either using multiple experts or using a strategy such as Quantitative StructureActivity Relationship (QSAR) modeling [40]. We have previously shown that an AL based strategy outperforms a singlebased target strategy panel of experts using QSAR [19]. Therefore, we decided to compare our strategy with random sampling, since random selection is often difficult to improve upon [41].
Comparison of active and random learning strategies
Predicting the accuracy of the model
Learning the stopping rule
Applying the stopping rule
Average AUC on hold out data and percentage of experiments after applying our stopping rule. The average AUC obtained on held out data using 80 % of the data for training. Random sampling of the training data [24] is compared with sampling the training data by active learning (AL) and sampling by preclustering of the drugs. Furthermore, the average AUC obtained by training with only the listed percentage of experiments obtained by applying the stopping rule is provided. The percentage of experiments can be halved by using the proposed stopping rule
Goenen results  Preclustering  AL  With stopping rule  

Dataset  AUC (%)  AUC (%)  AUC (%)  AUC(%)  experiments (%) 
NR  82.4  84.0  93.6  81.7  52.9 
GPCR  85.7  86.4  90.6  81.6  39.3 
IC  79.9  85.3  86.8  83.8  44.2 
Enz  83.2  85.8  90.3  77.8  29.7 
We also tested whether simply clustering drugs according to their similarity could lead to a better training set (similar to the strategy of identifying a ’representative set’ of drugs for screening). We applied kmeans clustering on the drug similarity matrix and the number of clusters was chosen using the Akaike information criterion. A set of drugs was chosen to maximize representation of this clustering, either of a fixed size of 80 % or of the same size as that found for a particular dataset by active learning. This approach performs slightly better than random choice of drugs but not as well as active learning selection (Table 1).
Comparison of stopping rules
Average difference between the BST point and the stopping point chosen by various stopping rules. OU=Overall Average Uncertainty, MEE=Minimum Expected Error PA=Predicted Accuracy. The value in the parentheses denotes the threshold. The smaller the difference Δ _{ ave } value is, the better the stopping criterion is. The proposed method (PA) with threshold 0.9 performed the best
Methods  OU(0.12)  OU(0.09)  OU(0.06)  OU(0.03)  OU(adapted) 

Δ _{ ave }(%)  40.1 (± 12.2)  33.8 (± 17.8)  40.1 (± 21.3)  50.9 (± 5.4)  28.2 (± 29.1) 
Methods  MEE(0.12)  MEE(0.09)  MEE(0.06)  MEE(0.03)  MEE(adapted) 
Δ _{ ave }(%)  40.1 (± 11.7)  38.3 (± 12.7)  36.1 (± 13.4)  40.6 (± 12.1)  30.3 (± 12.6) 
Methods  PA(0.85)  PA(0.9)  PA(0.95)  
Δ _{ ave }(%)  32.8 (± 8.8)  13.7 (± 11.3)  22.1 (± 15.4) 
Discussion and conclusions
We have presented an active learning method for prediction of drugtarget interactions based on kernelized matrix factorization. Building on prior work [24], our model can efficiently leverage prior information through kernels to achieve high predictive accuracy. We have furthermore shown that our method can significantly improve the prediction task for drugtarget interactions when only a limited number of experiments can be performed. For three realworld data sets with high uniqueness values, the active learning strategy achieves 99 % accuracy with 23 times fewer experiments than a random sampling strategy. It is important to note that our goal was not to choose the best possible matrix completion method for these specific datasets, but to show that a good method can be used as a basis for active learning to dramatically reduce further experimentation.
It should therefore be emphasized that the presented framework is not limited to KBMF only. Any other model for drug target prediction could be applied that produces outputs for drugtarget scores which can be converted into probabilities. Furthermore the selection strategy we used (uncertainty sampling) could be replaced by any other active learning strategy (i.e. diversity sampling) to learn new traces on simulated data. Presumably, the regression model for predicting accuracy from simulated active learning traces could also be improved.
For a practitioner to realize these advantages, we have provided a method for estimating the accuracy of an actively learned model using only experimental results already collected; this estimated accuracy is generally a lower bound of the true accuracy of the model. We have shown that this method, calibrated from simulation data, accurately assesses the active learner performance on our realworld data. We have also shown that by applying a stopping rule learned on the simulated data, only half of the experiments are needed to achieve similar accuracies on holdout data. We conclude that active learning driven experimentation is a practical solution to large experimental problems in which time or expense make exhaustive experimentation undesirable.
Declarations
Acknowledgements
This study was supported by BMBF e:BIO grant ’Microsystems’ FKZ0316185. The article processing charge was funded by the German Research Foundation (DFG) and the Albert Ludwigs University Freiburg in the funding programme Open Access Publishing by the Albert Ludwigs University Freiburg. This paper was selected for oral presentation at RECOMB 2015 and an abstract is published in the conference proceedings.
Authors’ Affiliations
References
 Murphy RF. An active role for machine learning in drug development. Nat Chem Biol. 2011; 7:327–30.View ArticlePubMedPubMed CentralGoogle Scholar
 Besnard J, Ruda GF, Setola V, Abecassis K, Rodriguiz RM, Huang XP, et al.Automated design of ligands to polypharmacological profiles. Nature. 2012; 492:215–20.View ArticlePubMedGoogle Scholar
 Paolini GV, Shapland RHB, van Hoorn WP, Mason JS, Hopkins AL. Global mapping of pharmacological space. Nat Biotechnol. 2006; 24:805–15.View ArticlePubMedGoogle Scholar
 Reymond JL, van Deursen R, Blum LC, Ruddigkeit L. Chemical space as a source for new drugs. MedChemComm. 2010; 1:30–8.View ArticleGoogle Scholar
 Keiser MJ, Setola V, Irwin JJ, Laggner C, Abbas AI, Hufeisen SJ, et al.Predicting new molecular targets for known drugs. Nature. 2009; 462:175–81.View ArticlePubMedPubMed CentralGoogle Scholar
 Box GEP, Wilson KB. On the experimental attainment of optimum conditions. J R Stat Soc Ser B (Methodol). 1951; 13:1–45.Google Scholar
 John PWM. An application of a balanced incomplete block design. Technometrics. 1961; 3:51–4.View ArticleGoogle Scholar
 Schein AI, Ungar LH. Active learning for logistic regression: An evaluation. Mach Learn. 2007; 68:235–65.View ArticleGoogle Scholar
 Warmuth MK, Liao J, Rätsch G, Mathieson M, Putta S, Lemmen C. Active learning with support vector machines in the drug discovery process. J Chem Inf Comput Sci. 2003; 43:667–73.View ArticlePubMedGoogle Scholar
 Danziger SA, Zeng J, Wang Y, Brachmann RK, Lathrop RH. Choosing where to look next in a mutation sequence space: Active learning of informative p53 cancer rescue mutants. Bioinformatics. 2007; 23(13):104–14.View ArticleGoogle Scholar
 Yamanishi Y, Kotera M, Kanehisa M, Goto S. Drugtarget interaction prediction from chemical, genomic and pharmacological data in an integrated framework. Bioinformatics. 2010; 26:246–54.View ArticleGoogle Scholar
 Atias N, Sharan R. An algorithmic framework for predicting side effects of drugs. J Comput Biol. 2011; 18:207–18.View ArticlePubMedGoogle Scholar
 Campillos M, Kuhn M, Gavin AC, Jensen LJ, Bork P. Drug target identification using sideeffect similarity. Science. 2008; 321:263–6.View ArticlePubMedGoogle Scholar
 Alaimo S, Pulvirenti A, Giugno R, Ferro A. Drugtarget interaction prediction through domaintuned networkbased inference. Bioinformatics. 2013; 29(16):2004–8.View ArticlePubMedPubMed CentralGoogle Scholar
 Cheng F, Liu C, Jiang J, Lu W, Li W, Liu G, et al.Prediction of drugtarget interactions and drug repositioning via networkbased inference. Plos Comput Biol. 2012; 8(5):e1002503.View ArticlePubMedPubMed CentralGoogle Scholar
 Zheng X, Ding H, Mamitsuka H, Zhu S. Collaborative matrix factorization with multiple similarities for predicting drugtarget interactions. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013. Chicago, IL, USA: 2013. p. 1025–1033.Google Scholar
 Bleakley K, Yamanishi Y. Supervised prediction of drugtarget interactions using bipartite local models. Bioinformatics. 2009; 25(18):2397–403. doi:10.1093/bioinformatics/btp433.View ArticlePubMedPubMed CentralGoogle Scholar
 Naik AW, Kangas JD, Langmead CJ, Murphy RF. Efficient modeling and active learning discovery of biological responses. PLoS ONE. 2013; 8(12):83996.View ArticleGoogle Scholar
 Kangas JD, Naik AW, Murphy RF. Efficient discovery of responses of proteins to compounds using active learning. BMC Bioinformatics. 2014; 15:143.View ArticlePubMedPubMed CentralGoogle Scholar
 Laws F, Schätze H. Stopping criteria for active learning of named entity recognition. In: Proceedings of the 22Nd International Conference on Computational Linguistics  Volume 1. COLING ’08. Stroudsburg, PA, USA: Association for Computational Linguistics: 2008. p. 465–72. http://dl.acm.org/citation.cfm?id=1599081.1599140.Google Scholar
 Vlachos A. A stopping criterion for active learning. Comput Speech Lang. 2008; 22(3):295–312.View ArticleGoogle Scholar
 Zhu J, Wang H, Hovy E, Ma M. Confidencebased stopping criteria for active learning for data annotation. ACM Trans Speech Lang Process. 2010; 6(3):3–1324.View ArticleGoogle Scholar
 Bazerque JA, Giannakis GB. Nonparametric basis pursuit via sparse kernelbased learning. IEEE Signal Proc Mag. 2013; 30:112–25.View ArticleGoogle Scholar
 Gönen M. Predicting drugtarget interactions from chemical and genomic kernels using bayesian matrix factorization. Bioinformatics. 2012; 28:2304–310.View ArticlePubMedGoogle Scholar
 Gönen M, Khan SA, Kaski S. Kernelized bayesian matrix factorization. In: Proceedings of the 30th International Conference on Machine Learning, ICML 2013. Atlanta, GA, USA: 2013. p. 864–72.Google Scholar
 Yamanishi Y, Araki M, Gutteridge A, Honda W, Kanehisa M. Prediction of drugtarget interaction networks from the integration of chemical and genomic spaces. Bioinformatics. 2008; 24:232–40.View ArticleGoogle Scholar
 Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981; 147:195–7.View ArticlePubMedGoogle Scholar
 Wilkins MR, Gasteiger E, Bairoch A, Sanchez W, Williams KL, Appel RD, et al.Protein identification and analysis tools in the expasy server. Meth Mol Biol. 1999; 112:531–5.Google Scholar
 de Castro E, Sigrist CJA, Gattiker A, Bulliard V, LangendijkGenevaux PS, Gasteiger E, Bairoch A, et al.Scanprosite detection of prosite signature matches and proruleassociated functional and structural residues in proteins. Nucl Acids Res. 2006; 34(suppl 2):362–5.View ArticleGoogle Scholar
 Hattori M, Tanaka N, Kanehisa M, Goto S. Simcomp/subcomp: chemical structure search servers for network analyses. Nucleic Acids Res. 2010; 38:652–6.View ArticleGoogle Scholar
 O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR. Open babel: An open chemical toolbox. J Cheminformatics. 2011; 3:33.View ArticleGoogle Scholar
 Gönen M. KBMF: Kernelized Bayesian Matrix Factorization. http://research.ics.aalto.fi/mi/software/kbmf/. Accessed: 20150424.
 Lewis D, Gale W. A sequential algorithm for training text classifiers. In: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in Information Retrieval. New York, NY, USA: SpringerVerlag New York, Inc.: 1994. p. 3–12.Google Scholar
 Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol). 1996; 58:267–88.Google Scholar
 Kanehisa M, Goto S, Hattori M, AokiKinoshita KF, Itoh M, Kawashima S, et al.From genomics to chemical genomics: new developments in kegg. Nucleic Acids Res. 2006; 34:354–7.View ArticleGoogle Scholar
 Schomburg I, Chang A, Ebeling C, Gremse M, Heldt C, Huhn G, et al.Brenda, the enzyme database: Updates and major new developments. Nucleic Acid Res. 2004; 32:431–3.View ArticleGoogle Scholar
 Günther S, Kuhn M, Dunkel M, Campillos M, Senger C, Petsalaki E, et al.Supertarget and matador: Resources for exploring drugtarget relationships. Nucleic Acid Res. 2008; 36:919–22.View ArticleGoogle Scholar
 Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, et al.Drugbank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acid Res. 2008; 36:901–6.View ArticleGoogle Scholar
 Yamanishi Y, Araki M, Gutteridge A, Honda W, Kanehisa M. Prediction of Drugtarget Interaction Networks from the Integration of Chemical and Genomic Spaces. http://web.kuicr.kyotou.ac.jp/supp/yoshi/drugtarget/. Accessed: 20150424.
 Leonard JT, Roy K. On selection of training and test sets for the development of predictive qsar models. QSAR Combinatorial Sci. 2006; 25:235–51.View ArticleGoogle Scholar
 Hanneke S. Activized learning: Transforming passive to active with improved label complexity. J Mach Learn Res. 2012; 13:1469–587.Google Scholar
Copyright
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.