 Methodology
 Open access
 Published:
Drug response prediction using graph representation learning and Laplacian feature selection
BMC Bioinformatics volumeÂ 23, ArticleÂ number:Â 532 (2022)
Abstract
Background
Knowing the responses of a patient to drugs is essential to make personalized medicine practical. Since the current clinical drug response experiments are timeconsuming and expensive, utilizing human genomic information and drug molecular characteristics to predict drug responses is of urgent importance. Although a variety of computational drug response prediction methods have been proposed, their effectiveness is still not satisfying.
Results
In this study, we propose a method called LGRDRP (Learning Graph Representation for Drug Response Prediction) to predict cell linedrug responses. At first, LGRDRP constructs a heterogeneous network integrating multiple kinds of information: cell line miRNA expression profiles, drug chemical structure similarity, genegene interaction, cell linegene interaction and known cell linedrug responses. Then, for each cell line, learning graph representation and Laplacian feature selection are combined to obtain network topology features related to the cell line. The learning graph representation method learns network topology structure features, and the Laplacian feature selection method further selects out some most important ones from them. Finally, LGRDRP trains an SVM model to predict drug responses based on the selected features of the known cell linedrug responses. Our fivefold crossvalidation results show that LGRDRP is significantly superior to the artofthestate methods in the measures of the average area under the receiver operating characteristics curve, the average area under the precisionrecall curve and the recall rate of topk predicted sensitive cell lines.
Conclusions
Our results demonstrated that the usage of multiple types of information about cell lines and drugs, the learning graph representation method, and the Laplacian feature selection is useful to the improvement of performance in predicting drug responses. We believe that such an approach would be easily extended to similar problems such as miRNAdisease relationship inference.
Background
Personalized medicine focuses on finding appropriate drugs for individual patients. Since the same drugs have different effects on different patients, knowing the responses to drugs for each individual is a prerequisite of personalized medicine [1]. Since clinical drug response experiments are timeconsuming and expensive, computational drug response prediction methods based on the related information of drugs and cell lines are of urgent practical importance and have attracted many researchers [2]. A variety of drug response prediction methods have been proposed, and they are mainly based on existing biological databases [3], of which Genomics of Drug Sensitivity in Cancer (GDSC) [4] and Cancer Cell Line Encyclopedia (CCLE) [5] are the two most famous. GDSC contains known cancer celldrug responses and the corresponding cell linesâ€™ profiles [4]. CCLE provides public access to the gene expression, gene methylation and mutation data of over 1100 cell lines [5]. These databases provide researchers with benchmark data to test drug response prediction methods.
Based on cell line gene expression data, Torkamani et al. [6] used PCA to extract gene expression features, and constructed a linear regression model to predict drug responses. Gupta et al. [7] proposed a prediction model based on genomic characteristics such as copy number variations of cancer cell lines. Based on the CCLE dataset, Fang et al. [8] used a quantile regression forest method to predict drug response. Based on a support vector machine and a recursive feature selection tool, Dong et al. [9] used the gene expression and drug sensitivity data in CCLE to build a drug response predictor. Using the same data set, Geeleher et al. [10] proposed a ridge regression prediction model. Liu et al. [11] proposed an ensemble learning method that integrated a lowrank matrix completion model and a ridge regression model to predict drug responses. By integrating the pathways of drug targets and the related gene sets, Ammad et al. [12] proposed a kernelized Bayesian matrix factorization with componentwise multiple kernel learning to predict drug responses.
Based on the gene expression features of cell lines and the chemical features of drugs, Li et al. [13] developed a deep learning architecture to learn a prediction model. Yan et al. [14] proposed an interpretable model to predict drug responses, which integrated drug features, cell line features and drug responses using triple matrix factorization, and Guvencpaltun et al. [15] proposed a framework of Bayesian importanceweighted trimatrix and twomatrix factorization to predict drug responses. These methods mainly considered the basic information of cell lines and drugs, and obtained good prediction performance for some certain drugs. However, they neglected other useful information such as the relationship between different celllines and the relationship between different drugs [16].
Based on the assumption that similar celllines tend to respond similarly to similar drugs, a lot of networkbased drug response prediction methods have been proposed recently. For example, after integrating different kinds of information such as
gene mutation, DNA copy number and mRNA expression data of cell lines and compound molecular properties, ATCcodes and sideeffects of drugs, Wang et al. [17] built similarity networks for cell lines and drugs, and proposed an SVM classifier model. Stanfield et al. [18] integrated cell line gene mutations, known cell linedrug responses and proteinprotein interactions (PPIs), and built a heterogeneous network consisting of the genes, cell lines and drugs. They utilized a random walk with restart (RWR) in the network to predict drug responses. Similarly, Zhang et al. [19] used cell line gene expression data to build a cell line similarity network, used drug chemical structures to build a drug similarity network, and used PPIs to build a genegene interaction network. They combined the networks with known cell linedrug associations and drugtarget (gene) interactions into a heterozygous network and proposed a prediction model. Based on a cell line similarity network and a drug similarity network, Liu et al. [20] adopted a neighborbased collaborative filtering with global effect removal method, Zhang et al. [21] adopted a hybrid interpolation weighted collaborative filtering method, and Guan et al. [22] utilized weighted graph regularized matrix factorization to predict drug responses.
In most network based drug response prediction methods, heterogeneous data are integrated using a weighted graph (i.e., a network), how to capture useful topological information of a graph is important for the efficiency of drug response prediction. Recently many learning graph representation methods have been introduced, and GraRep [23] is a stateoftheart one that could learn a global representation of a graph, which contains the topological information of the graph and is convenient to use as input features of machine learning methods.
In the paper, we formulate the drug response prediction problem as a classification task as most of the existing methods: for each drug, classify cell lines into two groups: sensitive and resistant according to the cell linesâ€™ features. To improve drug response prediction performance, we integrate several available related data such as known cell linedrug associations, miRNA expression profiles of cell lines, chemical structures of drugs, PPIs, cell line gene sequence variations and hypermethylation informtion into a heterogeneous network. Then GraRep [23] and Laplacian Feature Selection are used to learn the cell linesâ€™ features in the network and features reduction, respectively. Finally, an SVM model [24] for the classification task is trained on the network. Our method is called LGRDRP (a Learning Graph Representation method for Drug Response Prediction) and is illustrated in Fig. 1.
Results and discussion
We conducted a series of 5fold crossvalidation experiments to test the performance of LGRDRP and some other stateoftheart drug response prediction methods. The crossvalidations were done for each drug. When a query drug was selected, all the cell lines with known responses (sensitive or resistant) to the drug were randomly divided into 5 groups. We randomly select one group as the test data and the other four as the training data. The heterogeneous network of the train data was obtained by removing the edges between the query drug vertex and the test cell line vertices.
For each cell line, a drug response prediction method calculated a score, and the test cell lines were sorted according to their scores. With a fixed threshold, if the score of one cell line is below the threshold, it is labeled as negative (resistant), and if it is known sensitive to the query drug, it is a false negative; if it is known resistant to the drug, it is a true negative. When the prediction score of a cell line is equal to or above the threshold, it is viewed as positive, and if it is known sensitive to the query drug, it is a true positive; if it is known resistant to the drug, it is a false positive. The true positive rate (TPR), the false positive rate (FPR), the precision ratio (Prec) and the recall ratio (Rec) can be computed as follows: TPRÂ =Â TP/(TPÂ +Â FN), FPRÂ =Â FP/(FPÂ +Â TN), PrecÂ =Â TP/(TPÂ +Â FP), RecÂ =Â TP/(TPÂ +Â FN), where TP, FN, FP and TN are the numbers of cell lines that are true positive, false negative, false positive and true negative, respectively.
With the threshold increases from the smallest score to the highest score, a receiver operating characteristic (ROC) curve is drawn according to the varying TPRs and FPRs as Xaxis values and Yaxis values respectively.
The area under the ROC curve (AUC) is calculated to evaluate the prediction performance. Since in our data set, the number of resistant responses is much larger than the sensitive responses. To better measure the prediction performance, we also used another metric: the area under the precisionrecall (PR) curve (AUPR). A PR curve is a trajectory of the performance at a plane with the precision ratio as Yaxis value and recall ratio as Xaxis value when the threshold changes.
Furthermore we may be more interested in the cell lines at the top of the sorted list. Therefore, the percentages of the true sensitive cell lines in the top 10, 20, 50, 100 of the sorted cell lines according to their response scores to the query drug were also used to evaluate the prediction performance.
Parameters selection
There are three parameters K,Â d and t need to set for LGRDRP, where K denotes the maximal transit steps, d determines the dimension of representation vectors and t is the number of features remained after the feature selection procedure. We tested the performance of LGRDRP with different K,Â d and t. Figure 2 shows the average AUC over different combinations of K,Â d and t. In the left panel, t was set 64. Please note, when the length \(K \times d\) of the final representation vector is less than 64, all features are kept. It can be observed when K increased, the average AUC of LGRDRP increased accordingly. However, when K increased from 6 to 7, the performance of LGRDRP didnâ€™t improve too much, but the computing time increased significantly. When d increased from 16 to 256, AUC increased accordingly. However, when d is set 512, AUC begins to decrease. The reason may be that the overfitting model results in poor generalization.
With \(d = 256\) and \(K = 6\), experiments were also conducted with varying t, and the results are shown in the right panel of Fig. 2. It indicates that the average AUC reached the best when \(t = 64\). When t is too small, the left features canâ€™t capture enough structure information, but when t is too large, too many features may include some unimportant information which may disturb the prediction ability [25].
In the following tests, \(K = 6, d = 256\) and \(t = 64\) without specific description.
Performance evaluation
We compared the prediction performance of LGRDRP with three other artofthestate methods: HNMDRP [19], SVMDRP [17] and Stanfieldâ€™s method [18]. The parameters of HNMDRP, SVMDRP and Stanfieldâ€™s method were set as recommended by the corresponding literature. We compared their performances over 226 drugs of the GDSC dataset via the same fivefold crossvalidation experiments, and the results are shown in Fig.Â 3. FigureÂ 3Aâ€“C displays their ROC curves on three drugs: VX680, Erlotinib and Nilotinib. It can be observed that for each case the ROC curve of LGRDRP is clearly above those of the others, which implies the prediction performance of LGRDRP is the best. FigureÂ 3D illustrates the AUCs over all drugs with the comparison results reported as boxplots, which shows LGRDRP is generally more accurate than other methods. HNMDRP performs slightly better than SVMDRP, and SVMDRP better than Stanfieldâ€™s method, which indicates that protein information is not sufficient to reveal the cell linedrug association and integration of multi biological information could improve the predictive power. The average AUC of LGRDRP over all drugs is 0.8131, and achieves the highest value 0.9422 as regarding to drug SNX2112, an oral antitumor drug as a Hsp90 inhibitor. For 50% drugs, the AUCs of LGRDRP are larger than 0.8229, and for 25% drugs, the AUCs of LGRDRP are larger than 0.8734.
FigureÂ 4Aâ€“C illustrates the PR curves of LGRDRP, HNMDRP, SVMDRP and Stanfieldâ€™s method on three drugs FMK, AP24534 and BMS345541. The PR curves of LGRDRP also lie above those of the other methods. The average AUPR of LGRDRP over all drugs is 8.52%, 11.63%, 16.79% higher than those of HNMDRP, SVMDRP and Stanfieldâ€™s method respectively. The AUPRs of the methods over all drugs are shown in Fig.Â 4D. The experiment results indicate that LGRDRP is successful to accomplish the prediction task even with the greatly unbalanced data set, demonstrating its reliability and prediction capability.
FigureÂ 5 shows the retrieved number of real sensitive cell lines in the predicted top 10, 20, 50, 100 sensitive cell lines for drugs CAY10603 and NVPBHG712, and LGRDRPP shows a significant advantage over the other methods again.
Conclusion
In this paper, we propose a drug responses prediction method called LGRDRP. It first uses the cell line miRNA expression profile to build a cell line similarity network, drug chemical structures to build drug similarity network, cell line gene variations and methylation data to build cell linegene interaction network. By integrating the known cell linedrug responses into the above networks, LGRDRP constructs a heterogeneous network. Then LGRDRP uses a learning graph representation method GraRep to obtain the representation vectors as the topology structure features of vetices in the network. To avoid overfitting causing by using too many features, a Laplacian score method is adopted to pick out some important features. Finally, LGRDRP learns an SVM model which is used to predict drug responses. Extensive 5fold crossvalidation experiments showed that LGRDRP was generally superior to three artofthestate methods HNMDRP, SVMDRP and Stanfieldâ€™s method. The success of our method is based on the effective integration of diverse biological information, the good graph representation of the topology structure of the network, and the effective feature selection. After minor modifications or simple extensions, LGRDRP can also be employed in other biological predictions such as genedisease [26], drugtarget [27] and microRNAdisease [28], and the prediction performance can be further improved by admitting other appropriate biological information, such as gene function annotations and drug semantic annotations. In clinical practice, some combinations of multiple drugs can increase treatment efficacy, and the response prediction of a cell line response to a drug combination is an important extension of the single drug response prediction [29]. In the future, we are going to improve LGRDPR so that it could deal with the drug combination response prediction.
Methods
Construction of the heterogeneous network
The heterogeneous network consists of a drug similarity network, a cell line similarity network, a gene similarity network, a cell linedrug interaction network, and a cell linegene interaction network, as shown in Fig.Â 1A.
The drug similarity network is based on the chemical structure data of 226 frequently used drugs, which consists of a 3D structure similarity matrix of the drugs and was downloaded from PubChem (http://pubchem.ncbi.nlm.nih.gov/). To avoid disturbing from noises and make sure the network has a clear biological meaning, the elements smaller than 0.2 in the similarity matrix were set as 0. The drug similarity network consists of 226 vertices and 24456 edges, where each vertex denotes a drug and each edge has a similarity score weight (\(\ge 0.2\)).
miRNA expression information of cell lines could be used to classify cancer cell lines into subtypes [30], and our cell line similarity network was built on the miRNA expression data of 968 cancer cell lines, which was from CCLE (http://www.broadinstitute.org/ccle). The Pearson correlation coefficient of the miRNA expressions of two cell lines is regarded as the similarity between them, and is used as the weight of the corresponding edge in the network. Proteinprotein interactions (PPIs) have been extensively studied, and we used the interactions between the proteins to represent the interactions between the genes coding the proteins. The genegene similarity network is based on the PPI data from iRefIndex [31], which contains 2981 genes and 53409 genegene interactions, with each gene possessing at least 5 interactions.
The cell linedrug interaction network is based on the drug response data of the 968 cell lines and the 226 drugs from GDSC (http://www.cancerrxgene.org/). The responses have been divided into two types sensitive and resistant according to the lognormalized IC50 threshold, and there are 20346 sensitive responses and 155277 resistant responses. Accordingly, there are 20346 interaction edges with weight 1 in the cell linedrug interaction network.
The copy number variation, somatic mutation and hypermethylation are called Cancer Functional Events (CFEs). The CFE data of the cell lines have been downloaded from GDSC and were used to build the cell linegene interaction network. Similar to previous literature [32], we classified the cell linegene relationship into associated and unassociated according to whether the coverage percentage of CFEs in the gene of the cell line is higher than 5%. Finally, we obtained a cell linegene interaction network of 14330 associations between the 968 cell lines and the 2981 genes. The network is a bipartite graph consisting of gene vertices and cell line vertices and edges with weight 1 indicating corresponding cell linegene associations.
Finally, we constructed a heterogeneous network including 226 drugs vertices in the drug similarity network, 2981 gene vertices in the gene similarity network, and 968 cell line vertices in the cell line similarity network. The network is represented as a weighted graph \(G=(V,E)\). The vertex set of G is \(V=\{v_1,v_2,\ldots,v_n \}\) whose element denotes a drug, a cell line or a gene. The edge set of G is \(E=\{e_{i,j} \}\) whose element denotes the relationship between vertex \(v_i\) and vertex \(v_j\), and the weight of an edge \(e_{i,j}\) is set as described above.
Learning graph representation
As most similar works, we assume that similar cell lines tend to have similar responses to the same drug, and predict the response of a query cell line to a certain drug by utilizing the similarity between the query cell line and other cell lines with known responses to the drug. Since the heterogeneous network has integrated multiple types of information related with celllines and drugs, the neighbourhood of celllines in the network could be used to measure the similarity between celllines. Based on the idea, we use the learning graph representation method GraRep [23] to obtain representation vectors as the topology structure features of the vertices in the network.
Given a graph G, Learning Graph Representation (LGR) aims to learn a feature vector \(F_i \in R^d\) for vertex \(v_i\) such that the global topology structure information (i.e. the neighbourhood) of the vertex is captured in the vector. In our method, the global topology structure information is represented by the distinct connections in different transitional steps between vertices, which is calculated by the process of LGR and is described as follows.
We use an \(n \times n\) adjacent matrix M to represent the heterogeneous network G, and element \(M_{ij}\) is the weight of the edge between the vertices \(v_i\) and \(v_j\). Based on M, a weighted degree matrix W is calculated according to Eq. (1).
An edge of G implies an association relation. Considering transition between vertices, larger \(M_{ij}\) means larger transition probability from \(v_i\) to \(v_j\). Hence the 1step probability transition matrix T is calculated according Eq. (2).
where the element \(T_{ij}\) is the transition probability from vertex \(v_i\) to \(v_j\) with exact one step. A kstep probability transition matrix \(T^k\) is calculated according Eq.Â (3).
where \(T_{ij}^k\) is the transiting probability from vertex \(v_i\) to \(v_j\) with exact k steps.
For a drug d and a cell line c, let the representation vectors of d and c are \(\vec {d}\) and \(\vec {c}\) respectively, and we model the possibility \(P(E=1 \mid d,c)\) that c is sensitive to d (i.e. there is an edge between the vertices d and c in G) as follows:
where \(\sigma (.)\) is the sigmoid function. Accordingly, \(P(E=0 \mid d,c)\) denotes the possibility that c is resistant to d (i.e. there is no edge between the vertices d and c in G) and \(P(E=0 \mid d,c)= \sigma (  \vec {d} \cdot \vec {c} )\).
Our objective is to maximize \(P(E=1 \mid d,c)\) for the observed edge (d,Â c) in G while maximizing \(P(E=0 \mid d,c)\) for the resistant response that there is no edge between the vertices d and c. Therefore, the following loglikelihood \(\ell\) of G is our global objective function.
where \(V_D\) and \(V_C\) are the drug vertex set and the cell line vertex set respectively, and E(d,Â c) indicates whether there is an edge between the drug vertex d and the cell line vertex c: \(E(d,c) = 1\) if there is an edge, otherwise, it is 0. N is the number of resistant responses, and \({\mathbb {E}}\) is the expectation value of the loglikelihood of resistant responses and is defined as Eq. (6).
where \(p_E (c)\) is the transiting probability from the drug vertex d to the cell vertex c.
Let \(x=\vec {d} \cdot \vec {c}\), and maximizing \(\ell\) requires the derivative of \(\ell\) with respect to x be 0, therefore Eq. (7) follows.
Let \(D_i\) denotes the representation vector of ith drug vertex, \(C_j\) denotes the representation vector of jth cell line vertex, and \(Y_{ij}=D_{i} \cdot C_{j}\). According to Eq.Â (7), we have:
where T is the probability transition matrix of graph G.
Considering kstep random walks and based on Eqs. (3) and (8), we obtain Eq. (9).
In order to obtain the representation vectors of the drug vertices and the cell line vertices, we apply a popular singular value decomposition (SVD) method to factorize Y.
The representation vector matrix of kstep random walks \(F^k\) is calculated as follows.
In Eqs. (10) and (11), \(\Sigma _{d}^{k}\) is the matrix composed by the top d singular values and \(U_d^k\) is the first d columns of \(U^k\), which are the first d eigenvector of \(Y^k (Y^k)^T\). For \(k = 1\) to K, we calculate K representation vector matrices, and concatenating them obtains an \(n \times K d\) matrix F, whose rows are the representation vectors of the vertices in G. F will be used as the input features in the following classification.
Classification via support vector machine
As we have assumed before, cell lines with similar topology structures in the network tend to have similar responses to the same drug. Since cell lines with similar topology structures are more similar with respect to the representation vectors, we construct a binary classification model using the representation vectors of the cell lines as the input features and output their responses (sensitive or resistant) to a query drug. To integrate comprehensive similarity information between cell lines, we consider all kstep representations with \(k =1, 2,\ldots , K\). Using a larger K could capture more distant similarity information but also introduce more noise.
In case that drug vertices and genes vertices have few edges with cell line vertices in the heterogeneous network, there would be a large number of poor features for some cell lines. When K and d are large numbers, the number of features ( i.e., the length \(K \times d\) of the representation vector) will be too large, and the overfitting problem will occur. To deal with the problem, we use a Laplacian score method [24] to select out most valuable features from the \(K \times d\) features. For each feature, the Laplacian score method assesses the ability to represent the graph structure and calculate a corresponding score. We only use the features with top t Laplacian scores for the classification. The values of K,Â d, and t are determined by 5fold crossvalidation experiments as described earlier in the subsection Parameters Selection.
In the heterogeneous network, the number of sensitive drug responses is 20346, while the number of resistant drug responses is 155277, which is a great imbalance of positive and negative samples. Since the support vector machine (SVM) method could effectively deal with the imbalance problem by assigning different weights to positive and negative samples, we chose SVM to conduct the classification task. In the following experiments, we employed LIBSVM [24] to do the classification. LIBSVM is an integrated software package including diverse SVM models, which can be chosen by setting some options. We set the options of LIBSVM as follows: the SVM_type was set as the default CSVC, the kernel type was set as polynomial
and the order of the polynomial kernel took the default value 3, the weights for the positive class and the negative class were set as the number of negative samples, and positive samples, respectively, and the other options were left default. For each drug, LIBSVM can learn an SVC model on the training data, and for each cell line, the model outputs a decision score. If the decision score is larger than 0, the cell line is predicted sensitive to the drug, and a larger score indicates that the prediction is more convinced.
Availability of data and materials
The datasets used and analysed during the current study available from the corresponding author on reasonable request.
Abbreviations
 AUC:

Area under receiver operating characteristic curve
 AUPRC:

Area under precisionrecall curve
 SVM:

Support vector machine
References
...Costello JC, Heiser LM, Georgii E, Gonen M, Menden MP, Wang NJ, Bansal M, Ammaduddin M, Hintsanen P, Khan SA, Mpindi JP, Kallioniemi O, Honkela A, Aittokallio T, Wennerberg K, Community ND, Collins JJ, Gallahan D, Singer D, SaezRodriguez J, Kaski S, Gray JW, Stolovitzky G. A community effort to assess and improve drug sensitivity prediction algorithms. Nat Biotechnol. 2014;32(12):1202â€“12. https://doi.org/10.1038/nbt.2877.
Eisenstein M. Personalized medicine: special treatment. Nature. 2014;513(7517):8â€“9. https://doi.org/10.1038/513S8a.
Mirnezami R, Nicholson J, Darzi A. Preparing for precision medicine. N Engl J Med. 2012;366(6):489â€“91. https://doi.org/10.1056/NEJMp1114866.
Yang W, Soares J, Greninger P, Edelman EJ, Lightfoot H, Forbes S, Bindal N, Beare D, Smith JA, Thompson IR, Ramaswamy S, Futreal PA, Haber DA, Stratton MR, Benes C, McDermott U, Garnett MJ. Genomics of drug sensitivity in cancer (gdsc): a resource for therapeutic biomarker discovery in cancer cells. Nucleic Acids Res. 2013;41(Database issue):955â€“61. https://doi.org/10.1093/nar/gks1111.
Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, Wilson CJ, Lehar J, Kryukov GV, Sonkin D, Reddy A, Liu M, Murray L, Berger MF, Monahan JE, Morais P, Meltzer J, Korejwa A, JaneValbuena J, Mapa FA, Thibault J, BricFurlong E, Raman P, Shipway A, Engels IH, Cheng J, Yu GK, Yu J, Aspesi JP, de Silva M, Jagtap K, Jones MD, Wang L, Hatton C, Palescandolo E, Gupta S, Mahan S, Sougnez C, Onofrio RC, Liefeld T, MacConaill L, Winckler W, Reich M, Li N, Mesirov JP, Gabriel SB, Getz G, Ardlie K, Chan V, Myer VE, Weber BL, Porter J, Warmuth M, Finan P, Harris JL, Meyerson M, Golub TR, Morrissey MP, Sellers WR, Schlegel R, Garraway LA. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483(7391):603â€“7. https://doi.org/10.1038/nature11003.
Torkamani A, Schork NJ. Background gene expression networks significantly enhance drug response prediction by transcriptional profiling. Pharmacogenomics J. 2012;12(5):446â€“52. https://doi.org/10.1038/tpj.2011.35.
Gupta S, Chaudhary K, Kumar R, Gautam A, Nanda JS, Dhanda SK, Brahmachari SK, Raghava GP. Prioritization of anticancer drugs against a cancer using genomic features of cancer cells: a step towards personalized medicine. Sci Rep. 2016;6:23857. https://doi.org/10.1038/srep23857.
Fang Y, Xu P, Yang J, Qin Y. A quantile regression forest based method to predict drug response and assess prediction reliability. PLoS One. 2018;13(10):0205155. https://doi.org/10.1371/journal.pone.0205155.
Dong ZL, Zhang NQ, Li C, Wang HY, Fang Y, Wang J, Zheng XQ. Anticancer drug sensitivity prediction in cell lines from baseline gene expression through recursive feature selection. BMC Cancer. 2015;15:489. https://doi.org/10.1186/s1288501514926.
Geeleher P, Cox NJ, Huang RS. Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines. Genome Biol. 2014;15(3):47. https://doi.org/10.1186/gb2014153r47.
Liu CY, Wei D, Xiang J, Ren FQ, Huang L, Lang JD, Tian G, Li YS, Yang JL. An improved anticancer drugresponse prediction based on an ensemble method integrating matrix completion and ridge regression. Mol Ther Nucleic Acids. 2020;21:676â€“86. https://doi.org/10.1016/j.omtn.2020.07.003.
Ammaduddin M, Georgii E, Gonen M, Laitinen T, Kallioniemi O, Wennerberg K, Poso A, Kaski S. Integrative and personalized QSAR analysis in cancer by kernelized Bayesian matrix factorization. J Chem Inf Model. 2014;54(8):2347â€“59. https://doi.org/10.1021/ci500152b.
Li M, Wang Y, Zheng R, Shi X, Li Y, Wu FX, Wang J. Deepdsc: a deep learning method to predict drug sensitivity of cancer cell lines. IEEE/ACM Trans Comput Biol Bioinform. 2021;18(2):575â€“82. https://doi.org/10.1109/TCBB.2019.2919581.
Yan XY, Zhang SW, Yiu SM, Shi JY. Interpretable prediction of drugcell line response by triple matrix factorization. Quantit Biol. 2021;9(4):426â€“39. https://doi.org/10.15302/jqb0210259.
Guvenc Paltun B, Kaski S, Mamitsuka H. Diverse: Bayesian data integrative learning for precise drug response prediction. IEEE/ACM Trans Comput Biol Bioinform. 2021. https://doi.org/10.1109/tcbb.2021.3065535.
Creighton CJ. Molecular classification and drug response prediction in cancer. Curr Drug Targets. 2012;13(12):1488â€“94. https://doi.org/10.2174/138945012803530143.
Wang Y, Fang J, Chen S. Inferences of drug responses in cancer cells from cancer genomic features and compound chemical and therapeutic properties. Sci Rep. 2016;6:32679. https://doi.org/10.1038/srep32679.
Stanfield Z, Coskun M, Koyuturk M. Drug response prediction as a link prediction problem. Sci Rep. 2017;7:40321. https://doi.org/10.1038/srep40321.
Zhang F, Wang M, Xi J, Yang J, Li A. A novel heterogeneous networkbased method for drug response prediction in cancer cell lines. Sci Rep. 2018;8(1):3355. https://doi.org/10.1038/s41598018216224.
Liu H, Zhao Y, Zhang L, Chen X. Anticancer drug response prediction using neighborbased collaborative filtering with global effect removal. Mol Ther Nucleic Acids. 2018;13:303â€“11. https://doi.org/10.1016/j.omtn.2018.09.011.
Zhang L, Chen X, Guan NN, Liu H, Li JQ. A hybrid interpolation weighted collaborative filtering method for anticancer drug response prediction. Front Pharmacol. 2018;9:1017. https://doi.org/10.3389/fphar.2018.01017.
Guan NN, Zhao Y, Wang CC, Li JQ, Chen X, Piao X. Anticancer drug response prediction in cell lines using weighted graph regularized matrix factorization. Mol Ther Nucleic Acids. 2019;17:164â€“74. https://doi.org/10.1016/j.omtn.2019.05.017.
Cao S, Lu W, Xu Q. Grarep: Learning graph representations with global structural information. In: Proceedings of the 24th ACM international on conference on information and knowledge management. New York, NY, United States: Association for Computing Machinery; 2015. pp. 891â€“900. https://doi.org/10.1145/2806416.2806512.
Chang CC, Lin CJ. Libsvm: a library for support vector machines. Acm Trans Intell Syst Technol. 2011. https://doi.org/10.1145/1961189.1961199.
Awan SE, Bennamoun M, Sohel F, Sanfilippo FM, Chow BJ, Dwivedi G. Feature selection and transformation by machine learning reduce variable numbers and improve prediction for heart failure readmission or death. PLoS One. 2019;14(6):0218760. https://doi.org/10.1371/journal.pone.0218760.
Le DH. Machine learningbased approaches for disease gene prediction. Brief Funct Genomics. 2020;19(5â€“6):350â€“63. https://doi.org/10.1093/bfgp/elaa013.
Chen X, Yan CC, Zhang XT, Zhang X, Dai F, Yin J, Zhang YD. Drugtarget interaction prediction: databases, web servers and computational models. Brief Bioinform. 2016;17(4):696â€“712. https://doi.org/10.1093/bib/bbv066.
Chen X, Xie D, Zhao Q, You ZH. Micrornas and complex diseases: from experimental results to computational models. Brief Bioinform. 2019;20(2):515â€“39. https://doi.org/10.1093/bib/bbx130.
Chen X, Ren B, Chen M, Wang Q, Zhang L, Yan G. Nllss: predicting synergistic drug combinations based on semisupervised learning. Plos Comput Biol. 2016;12(7):1004975. https://doi.org/10.1371/journal.pcbi.1004975.
Yu CW, Dai DJ, Xie J. Molecular subtype classification of papillary renal cell cancer using mirna expression. Oncotargets Therapy. 2019;12:2311â€“22. https://doi.org/10.2147/Ott.S193808.
Razick S, Magklaras G, Donaldson IM. irefindex: a consolidated protein interaction database with provenance. BMC Bioinformatics. 2008;9:405. https://doi.org/10.1186/147121059405.
Iorio F, Knijnenburg TA, Vis DJ, Bignell GR, Menden MP, Schubert M, Aben N, Goncalves E, Barthorpe S, Lightfoot H, Cokelaer T, Greninger P, van Dyk E, Chang H, de Silva H, Heyn H, Deng XM, Egan RK, Liu QS, Mironenko T, Mitropoulos X, Richardson L, Wang JH, Zhang TH, Moran S, Sayols S, Soleimani M, Tamborero D, LopezBigas N, RossMacdonald P, Esteller M, Gray NS, Haber DA, Stratton MR, Benes CH, Wessels LFA, SaezRodriguez J, McDermott U, Garnett MJ. A landscape of pharmacogenomic interactions in cancer. Cell. 2016;166(3):740â€“54. https://doi.org/10.1016/j.cell.2016.06.017.
Acknowledgements
We would like to thank the anonymous reviewers for their helpful and constructive comments.
About this supplement
This article has been published as part of BMC Bioinformatics Volume 23 Supplement 8, 2022: Selected articles from the 16th International Symposium on Bioinformatics Research and Applications (ISBRA20): bioinformatics. The full contents of the supplement are available online at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume23supplement8.
Funding
This work was supported by the National Natural Science Foundation of China (Nos. 62172028, 61772197).
Author information
Authors and Affiliations
Contributions
M.X. and J.O. designed the model, algorithm and experiments. J.O. implemented the algorithm and X.L. conducted the experiments. M.X., G.L. and J.Z. wrote the paper. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not Applicable.
Consent for publication
Not Applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Xie, M., Lei, X., Zhong, J. et al. Drug response prediction using graph representation learning and Laplacian feature selection. BMC Bioinformatics 23 (Suppl 8), 532 (2022). https://doi.org/10.1186/s12859022050804
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12859022050804