Predicting substrates of the human breast cancer resistance protein using a support vector machine method
© Hazai et al.; licensee BioMed Central Ltd. 2013
Received: 6 November 2012
Accepted: 12 April 2013
Published: 15 April 2013
Human breast cancer resistance protein (BCRP) is an ATP-binding cassette (ABC) efflux transporter that confers multidrug resistance in cancers and also plays an important role in the absorption, distribution and elimination of drugs. Prediction as to if drugs or new molecular entities are BCRP substrates should afford a cost-effective means that can help evaluate the pharmacokinetic properties, efficacy, and safety of these drugs or drug candidates. At present, limited studies have been done to develop in silico prediction models for BCRP substrates.
In this study, we developed support vector machine (SVM) models to predict wild-type BCRP substrates based on a total of 263 known BCRP substrates and non-substrates collected from literature. The final SVM model was integrated to a free web server.
We showed that the final SVM model had an overall prediction accuracy of ~73% for an independent external validation data set of 40 compounds. The prediction accuracy for wild-type BCRP substrates was ~76%, which is higher than that for non-substrates. The free web server (http://bcrp.althotas.com) allows the users to predict whether a query compound is a wild-type BCRP substrate and calculate its physicochemical properties such as molecular weight, logP value, and polarizability.
We have developed an SVM prediction model for wild-type BCRP substrates based on a relatively large number of known wild-type BCRP substrates and non-substrates. This model may prove valuable for screening substrates and non-substrates of BCRP, a clinically important ABC efflux drug transporter.
KeywordsBreast cancer resistance protein Support vector machine SVM ATP-binding cassette ABC transporter in silico prediction Substrate BCRP ABCG2
Human breast cancer resistance protein (BCRP, gene symbol ABCG2) is an ATP-binding cassette (ABC) efflux drug transporter [1, 2]. BCRP is one of the ABC transporters that confer resistance to a large number of structurally and chemically unrelated chemotherapeutic agents through ATP hydrolysis-dependent efflux transport of these drugs . The substrates of BCRP have been rapidly expanding to include not only chemotherapeutics such as mitoxantrone, topotecan and imatinib, but also non-chemotherapeutic drugs such as prazosin, glyburide, nitrofurantoin and statins as well as non-therapeutic compounds such as dietary flavonoids, porphyrins, estrone 3-sulfate, and the dietary carcinogen 2-amino-1-methyl-6-phenylimidazo[4,5-b]pyridine [1, 2]. BCRP is also highly expressed in organs important for the absorption (the small intestine), elimination (the liver and kidney), and distribution (the blood-brain and placental barriers) of drugs and xenobiotics , and has recently been recognized by the FDA as one of the most important drug transporters involved in clinically relevant drug disposition and drug-drug interactions . Due to the clinical importance of BCRP in drug resistance and drug disposition, it should be of high value to develop cost-effective methods for evaluation of transport of drugs or drug candidates by BCRP so that the pharmacokinetics, efficacy, safety, and tissue levels of these compounds may be predicted. One of such methods would be the development of in silico models for prediction of BCRP substrates.
Indeed, in the recent years, in silico prediction models have emerged into the pipeline of drug discovery which allow initial screening and selection of promising compounds from chemical libraries and large databases. In addition, these models could provide information concerning the mechanism of protein-ligand interactions. In silico methods for prediction of protein-ligand interactions including transport characteristics can be divided into ligand-based and protein structure-based approaches. With protein structure-based methods such as molecular docking, structures and physicochemical characteristics of an intermolecular complex formed between interacting protein and ligand could be predicted if high resolution structures of both the protein and the ligand under question are available. High resolution structures of BCRP have not been resolved. Homology models of BCRP have recently been developed and await further experimental validation [1, 5]. Although these homology models can be used for docking calculations and interpretation of biochemical data, results obtained are unlikely reliable for drug design and screening. In contrast, ligand-based methods based on structural similarity of ligands to known substrates generally yield much greater prediction accuracies than protein structure-based methods.
Among ligand-based methods, one common approach is to develop quantitative structure-activity relationship models (SAR and QSAR). The objective of SAR and QSAR analysis is to establish a correlation between descriptors which represent information of molecular structures of ligands and biological activities for a series of biologically and structurally characterized compounds. Various SAR and QSAR models for BCRP inhibitors have been published [6-8]. Several SAR and QSAR studies suggest that lipophilicity of ligands is a good predictor for BCRP inhibition [9-11], but other studies argue that this property is not significant [12, 13]. A planar structure of inhibitors seems to be necessary for binding to the active site of BCRP [9, 14, 15]. With respect to prediction of BCRP substrates, only one SAR study of camptothecin analogues revealed that hydrogen bond formation might be important for substrate recognition by BCRP . One common feature of these SAR and QSAR models is that these models are usually built using a congeneric series of molecules and thus may not be valid for other classes of compounds. For this reason, more sophisticated techniques are required for classification of BCRP ligands.
Another ligand-based approach is to use statistical learning methods to predict features based on properties of examples, and compounds of any chemical structures can be used. Of these methods, the support vector machine (SVM) method is most frequently used and has proved valuable in a wide range of applications. SVM has gained popularity in the chemo- and bioinformatics field due to its ability to classify objects into two classes based on their structural features. In particular, the SVM method was useful for classification of molecules as substrates or non-substrates of enzymes or transporters. For example, several studies have been reported for prediction of substrates and non-substrates of P-glycoprotein (P-gp) using SVM with generally greater than 70% prediction accuracies [17-20]. Zhong et al. recently reported a genetic algorithm-conjugate gradient-support vector machine (GA-CG-SVM) procedure for prediction of BCRP substrates and non-substrates . Although these studies are highly valuable, the scientific community has no open access to most of these published in silico models. There are a few SVM-based free web servers for predicting substrates and non-substrates of certain enzymes and transporters. For example, Mishra et al. reported a web server for cytochrome P450 enzymes , and our laboratories published a free web server for prediction of P-gp substrates and non-substrates using the SVM method (http://pgp.althotas.com) .
Therefore, in the present study, we have compiled a relatively large data set of BCRP substrates and non-substrates collected from literature and developed an SVM-based in silico model for prediction of wild-type BCRP substrates and non-substrates. This prediction model has been integrated into a free web server (http://bcrp.althotas.com) which allows the users to predict the capability of wild-type BCRP to transport the query ligands and calculate their physiochemical properties including molecular weight, logP value, and polarizability.
All known wild-type BCRP substrates and non-substrates used in this study were taken from published data in the literature. Information for some of these compounds in the data set was obtained through searching the University of Washington Metabolism & Transport Drug Interaction Database (http://www.druginteractioninfo.org/). This data set is based on results of in vitro transport assays such as the membrane vesicle uptake assay, the efflux assay using intact mammalian cells over-expressing BCRP, and transwell transport assay using MDCKII/BCRP cells. Results from in vitro drug resistance assays were also used. However, results from drug-stimulated ATPase assays were not used because many substrates do not stimulate ATPase activity of BCRP. In the case of conflicting evidence, only the results confirmed by at least two independent studies were accepted. This data set contains 164 BCRP substrates and 99 non-substrates with highly diverse chemical structures. We noticed that 60 out of the 164 substrates had multiple reports. However, only about 9 out of the 99 non-substrates had multiple reports. It is worth noting that the drug-selected BCRP mutants with amino acid substitutions at position 482 exhibit altered substrate specificity. For example, doxorubicin, rhodamine 123 and LysoTracker Green are substrates of the mutant R482G or R482T, but cannot be efficiently transported by wild-type BCRP [23-25]. Therefore, such compounds were classified as non-substrates of wild-type BCRP which was the subject of this study. Of the 263 compounds (164 substrates and 99 non-substrates), 223 compounds (139 substrates and 84 non-substrates) were randomly used in the training and test subsets in various training/test ratios, and 40 compounds (25 substrates and 15 non-substrates) were defined as the independent external validation subset. All compounds are listed in Additional file 1: Table S1. The chemical structures of all these molecules are shown in two sdf files provided as Additional files 2 and 3 which can be viewed using the free MarvinView software (http://www.chemaxon.com/products/marvin/marvinview/).
Support vector machine (SVM)
The SVM method we used in this study is essentially the same as previously described . Briefly, the standard procedure of classification by SVM can be divided into four stages. In the first stage, all compounds in the data set were defined as substrates and non-substrates of wild-type BCRP. Then, the molecules were characterized using molecular descriptors. The data set was then split into the training and test subsets, and an independent external validation subset was also created. In the second stage, the compounds in the training set were presented as points in a high-dimensional space according to their molecular descriptors. In this high-dimensional space, a hyperplane was determined to separate objects into substrate and non-substrate groups. Since various hyperplanes allow separation of objects, a hyperplane that maximizes the margin needs to be constructed. In the third stage, the models constructed using the training data set were used to calculate prediction accuracy for a test set to evaluate the models. Finally, the models were validated using the independent external data set.
Chemical structures of all wild-type BCRP substrates or non-substrates used in this study were downloaded from the PubChem Database (http://pubchem.ncbi.nlm.nih.gov). Some compounds were extracted from the original publications and redrawn by means of MarvinView (ChemAxon, Budapest, Hungary). All molecules were subject to geometry optimization using the Molconvert software (ChemAxon, Budapest, Hungary), which applies the Dreiding molecular mechanics force field, and to calculation of the Gasteiger partial charges . The DragonX software (http://www.talete.mi.it) was used to calculate a total of 3250 molecular descriptors for each molecule. The descriptors with more than 80% zero values and too small standard deviation values (less than 3%) were eliminated. The Libsvm software (http://www.csie.ntu.edu.tw/~cjlin/libsvm/) was then used for SVM calculations. Linear, polynomial, and radial basis function (RBF) kernels were tested in this study. RBF is calculated using the equation K(xi, xj) = exp(−γ||xi-xj||2), where γ is a kernel parameter, xi and xj are instance label pairs, and K is the kernel function. The prediction power of SVM is greatly influenced by the selection of kernel, the kernel parameter γ, and soft margin parameter C.
The web server
The best prediction model generated using the SVM method described above has been integrated into a free web server (http://bcrp.althotas.com). This web server allows the users to predict as to whether a query compound is likely to be a BCRP substrate. The chemical structure of the query compounds can be uploaded or drawn in by the users using the built-in Chemaxon Marvin Java applet. The web server is linked to PubChem so that any query compounds can be directly retrieved with text search. Any compounds of interest can be searched by their names, uploaded in PDB, mol, mol2, hin, or SMILES format or drawn in using a Marvin applet by the users. Structural conversions and 3-dimensional geometry optimization by the Dreiding method are carried out using the Molconvert software. Two-dimensional and 3-dimensional molecular descriptors are calculated using the DragonX software.
Results and discussion
The mean values of SVM prediction performance parameters of 100 runs using various kernels
Performance parameters of 100 runs using various ratios of training/test sets
Prediction power of the selected SVM model
Overlap of classification in 10 experimental models
Recently, an SVM study based on a different data set was published by Zhong et al.  and reported a higher overall prediction accuracy for BCRP substrates and non-substrates (85% for the test set). It should be noted that the compounds used by Zhong et al. were only divided into two sets, namely a training set and a test set, without an independent external validation data set. Also, the test set used by Zhong et al. was not independent when it was used for the selection of the best model. Therefore, their results cannot be directly compared to the data of this study. This is because, besides the training and test sets, we also used an independent external validation data set to evaluate prediction outcome and calculate prediction accuracies of the selected best model. Moreover, certain compounds in the data sets of Zhong et al. were actually the same under different names (e.g., folic acid versus vitamin B9 and daunomycin versus daunorubicin). Additionally, a number of compounds were classified as BCRP substrates (e.g., daunorubicin, rhodamine 123, LysoTracker Green, and epirubicin) by Zhong et al., but as non-substrates in this study as explained in the Methods section.
List of molecular descriptors found to be used by the selected SVM model
Mean information index on atomic composition
3D Morse signal 17/weighed by mass
3D Morse signal 25/weighed by mass
Gateway R autocorrelation of lag2 weighed by mass
In order to make the SVM model publicly available, we developed a free web server (http://bcrp.althotas.com). This web server enables the users to predict if a query compound is a BCRP substrate based on the selected SVM prediction model of this study.
In summary, BCRP is an ABC drug transporter that confers multidrug resistance in cancers and plays an important role in drug disposition. Therefore, it is important to develop in silico prediction models for BCRP substrates that could be used as cost-effective tools for screening of drug candidates in early drug discovery stage and for identification of BCRP substrates among existing drugs so that potential drug-drug interactions may be predicted. In the present study, using a carefully defined and relatively large data set with 263 known wild-type BCRP substrates and non-substrates, we have developed an SVM model for prediction of wild-type BCRP substrates and non-substrates with an overall prediction accuracy of ~73% for an independent external validation data set of 40 compounds. The prediction accuracy for wild-type BCRP substrates was ~76%, which is higher than that for non-substrates. The molecular descriptors used by this SVM model suggest that the 3-dimensional structure of a compound is possibly a predominant factor in determining BCRP/substrate interactions. This SVM prediction model has been integrated into a web server (http://bcrp.althotas.com) which is freely available to the scientific community. We believe that availability of such a prediction model will facilitate drug discovery as well as basic research investigating the role of BCRP in drug transport.
Breast cancer resistance protein
The second member of ABC transporter subfamily G
Support vector machine
Accuracy (overall prediction accuracy)
Specificity (prediction accuracy for non-substrates)
Sensitivity (prediction accuracy for substrates)
Matthews correlation coefficient (a more balanced prediction parameter than ACC)
This work was supported by the Hungarian State and the European Union (European Regional Development Fund), under the aegis of New Hungary Development Plan (KMOP-1.1.1-09/1-2009-0044), and by a grant from the National Institutes of Health, GM073715 (to QM).
- Ni Z, Bikadi Z, Rosenberg MF, Mao Q: Structure and function of the human breast cancer resistance protein (BCRP/ABCG2). Curr Drug Metab. 2010, 11 (7): 603-617. 10.2174/138920010792927325.PubMed CentralView ArticlePubMedGoogle Scholar
- Natarajan K, Xie Y, Baer MR, Ross DD: Role of breast cancer resistance protein (BCRP/ABCG2) in cancer drug resistance. Biochem Pharmacol. 2012, 83 (8): 1084-1103. 10.1016/j.bcp.2012.01.002.PubMed CentralView ArticlePubMedGoogle Scholar
- Maliepaard M, Scheffer GL, Faneyte IF, van Gastelen MA, Pijnenborg AC, Schinkel AH, van De Vijver MJ, Scheper RJ, Schellens JH: Subcellular localization and distribution of the breast cancer resistance protein transporter in normal human tissues. Cancer Res. 2001, 61 (8): 3458-3464.PubMedGoogle Scholar
- Giacomini KM, Huang SM, Tweedie DJ, Benet LZ, Brouwer KL, Chu X, Dahlin A, Evers R, Fischer V, Hillgren KM: Membrane transporters in drug development. Nat Rev Drug Discov. 2010, 9 (3): 215-236. 10.1038/nrd3028.View ArticlePubMedGoogle Scholar
- Rosenberg MF, Bikadi Z, Chan J, Liu X, Ni Z, Cai X, Ford RC, Mao Q: The human breast cancer resistance protein (BCRP/ABCG2) shows conformational changes with mitoxantrone. Structure. 2010, 18 (4): 482-493. 10.1016/j.str.2010.01.017.PubMed CentralView ArticlePubMedGoogle Scholar
- Gandhi YA, Morris ME: Structure-activity relationships and quantitative structure-activity relationships for breast cancer resistance protein (ABCG2). AAPS J. 2009, 11 (3): 541-552. 10.1208/s12248-009-9132-1.PubMed CentralView ArticlePubMedGoogle Scholar
- Ishikawa T, Hirano H, Saito H, Sano K, Ikegami Y, Yamaotsu N, Hirono S: Quantitative structure-activity relationship (QSAR) analysis to predict drug-drug interactions of ABC transporter ABCG2. Mini Rev Med Chem. 2012, 12 (6): 505-514. 10.2174/138955712800493825.View ArticlePubMedGoogle Scholar
- Nicolle E, Boumendjel A, Macalou S, Genoux E, Ahmed-Belkacem A, Carrupt PA, Di Pietro A: QSAR analysis and molecular modeling of ABCG2-specific inhibitors. Adv Drug Deliv Rev. 2009, 61 (1): 34-46. 10.1016/j.addr.2008.10.004.View ArticlePubMedGoogle Scholar
- Zhang S, Yang X, Coburn RA, Morris ME: Structure activity relationships and quantitative structure activity relationships for the flavonoid-mediated inhibition of breast cancer resistance protein. Biochem Pharmacol. 2005, 70 (4): 627-639. 10.1016/j.bcp.2005.05.017.View ArticlePubMedGoogle Scholar
- van Loevezijn A, Allen JD, Schinkel AH, Koomen GJ: Inhibition of BCRP-mediated drug efflux by fumitremorgin-type indolyl diketopiperazines. Bioorg Med Chem Lett. 2001, 11 (1): 29-32. 10.1016/S0960-894X(00)00588-6.View ArticlePubMedGoogle Scholar
- Matsson P, Englund G, Ahlin G, Bergstrom CA, Norinder U, Artursson P: A global drug inhibition pattern for the human ATP-binding cassette transporter breast cancer resistance protein (ABCG2). J Pharmacol Exp Ther. 2007, 323 (1): 19-30. 10.1124/jpet.107.124768.View ArticlePubMedGoogle Scholar
- Cramer J, Kopp S, Bates SE, Chiba P, Ecker GF: Multispecificity of drug transporters: probing inhibitor selectivity for the human drug efflux transporters ABCB1 and ABCG2. ChemMedChem. 2007, 2 (12): 1783-1788. 10.1002/cmdc.200700160.View ArticlePubMedGoogle Scholar
- Pick A, Muller H, Wiese M: Structure-activity relationships of new inhibitors of breast cancer resistance protein (ABCG2). Bioorg Med Chem. 2008, 16 (17): 8224-8236. 10.1016/j.bmc.2008.07.034.View ArticlePubMedGoogle Scholar
- Ahmed-Belkacem A, Pozza A, Munoz-Martinez F, Bates SE, Castanys S, Gamarro F, Di Pietro A, Perez-Victoria JM: Flavonoid structure-activity studies identify 6-prenylchrysin and tectochrysin as potent and specific inhibitors of breast cancer resistance protein ABCG2. Cancer Res. 2005, 65 (11): 4852-4860. 10.1158/0008-5472.CAN-04-1817.View ArticlePubMedGoogle Scholar
- Ahmed-Belkacem A, Macalou S, Borrelli F, Capasso R, Fattorusso E, Taglialatela-Scafati O, Di Pietro A: Nonprenylated rotenoids, a new class of potent breast cancer resistance protein inhibitors. J Med Chem. 2007, 50 (8): 1933-1938. 10.1021/jm061450q.View ArticlePubMedGoogle Scholar
- Nakagawa H, Saito H, Ikegami Y, Aida-Hyugaji S, Sawada S, Ishikawa T: Molecular modeling of new camptothecin analogues to circumvent ABCG2-mediated drug resistance in cancer. Cancer Lett. 2006, 234 (1): 81-89. 10.1016/j.canlet.2005.05.052.View ArticlePubMedGoogle Scholar
- Wang Z, Chen Y, Liang H, Bender A, Glen RC, Yan A: P-glycoprotein substrate models using support vector machines based on a comprehensive data set. J Chem Inf Model. 2011, 51 (6): 1447-1456. 10.1021/ci2001583.View ArticlePubMedGoogle Scholar
- Xue Y, Yap CW, Sun LZ, Cao ZW, Wang JF, Chen YZ: Prediction of P-glycoprotein substrates by a support vector machine approach. J Chem Inf Comput Sci. 2004, 44 (4): 1497-1505. 10.1021/ci049971e.View ArticlePubMedGoogle Scholar
- Huang J, Ma G, Muhammad I, Cheng Y: Identifying P-glycoprotein substrates using a support vector machine optimized by a particle swarm. J Chem Inf Model. 2007, 47 (4): 1638-1647. 10.1021/ci700083n.View ArticlePubMedGoogle Scholar
- Bikadi Z, Hazai I, Malik D, Jemnitz K, Veres Z, Hari P, Ni Z, Loo TW, Clarke DM, Hazai E: Predicting P-glycoprotein-mediated drug transport based on support vector machine and three-dimensional crystal structure of P-glycoprotein. PLoS One. 2011, 6 (10): e25815-10.1371/journal.pone.0025815.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhong L, Ma CY, Zhang H, Yang LJ, Wan HL, Xie QQ, Li LL, Yang SY: A prediction model of substrates and non-substrates of breast cancer resistance protein (BCRP) developed by GA-CG-SVM method. Comput Biol Med. 2011, 41 (11): 1006-1013. 10.1016/j.compbiomed.2011.08.009.View ArticlePubMedGoogle Scholar
- Mishra NK, Agarwal S, Raghava GP: Prediction of cytochrome P450 isoform responsible for metabolizing a drug molecule. BMC Pharmacol. 2010, 10: 8-PubMed CentralView ArticlePubMedGoogle Scholar
- Honjo Y, Hrycyna CA, Yan QW, Medina-Perez WY, Robey RW, van de Laar A, Litman T, Dean M, Bates SE: Acquired mutations in the MXR/BCRP/ABCP gene alter substrate specificity in MXR/BCRP/ABCP-overexpressing cells. Cancer Res. 2001, 61 (18): 6635-6639.PubMedGoogle Scholar
- Robey RW, Honjo Y, Morisaki K, Nadjem TA, Runge S, Risbood M, Poruchynsky MS, Bates SE: Mutations at amino-acid 482 in the ABCG2 gene affect substrate and antagonist specificity. Br J Cancer. 2003, 89 (10): 1971-1978. 10.1038/sj.bjc.6601370.PubMed CentralView ArticlePubMedGoogle Scholar
- Ozvegy-Laczka C, Koblos G, Sarkadi B, Varadi A: Single amino acid (482) variants of the ABCG2 multidrug transporter: major differences in transport capacity and substrate recognition. Biochim Biophys Acta. 2005, 1668 (1): 53-63. 10.1016/j.bbamem.2004.11.005.View ArticlePubMedGoogle Scholar
- Gasteiger JMM: Iterative partial equalization of orbital electronegativity - a rapid access to atomic charges. Tetrahedron. 1980, 36: 3219-3228. 10.1016/0040-4020(80)80168-2.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.