Applications of proteochemometrics - from species extrapolation to cell line sensitivity modelling
BMC Bioinformatics volume 16, Article number: A4 (2015)
Proteochemometrics (PCM) is a predictive bioactivity modelling method which simultaneously models the bioactivity of multiple ligands against multiple targets. PCM permits exploration of the selectivity and promiscuity of ligands on biomolecular systems of different complexity. This includes proteins and even cell-line models [1, 2]. The suitability of PCM to predict compound polypharmacology has been validated both retrospectively and in prospective experimental validation [1, 2]. In practice, each ligand-target interaction is encoded by the concatenation of ligand and target descriptor vectors used to train a single machine learning model. The inclusion of both chemical and target information enables the extra- and interpolation on the chemical and on the biological space. Therefore, PCM permits to predict compound bioactivities on targets not present in the training phase .
In this contribution, we show a methodological advancement in the field , namely how Bayesian inference (Gaussian Processes) can be successfully applied in the context of PCM for (i) the prediction of compound bioactivity along with the error estimation of the prediction; (ii) the determination of the applicability domain of a PCM model; and (iii) the inclusion of experimental uncertainty of bioactivity measurements. We illustrate how the application of PCM can be useful in medicinal chemistry to concomitantly optimize compounds selectivity and potency, in the context of two application scenarios: (a) modelling isoform-selective cyclooxygenase inhibition; and (b) large-scale cancer cell line drug sensitivity prediction, where we benchmark the predictive signal of basal gene expression, gene copy-number variation, exome sequencing, and protein abundance data. We present the R package Chemically Aware Model Builder (camb) , which is able to perform the above mentioned modelling tasks. camb is an open source platform for the generation of Structure-Activity and Structure-Property models. The functionalities of camb include: (i) standardisation of chemical structure representation, (ii) calculation of 905 one-dimensional descriptors and 14 fingerprints for small molecules, (iii) 8 types of amino acid descriptors, (iv) 13 whole protein sequence descriptors, and (iv) training, validation and visualization of predictive models.
Overall, the application of PCM in these two case scenarios let us conclude that PCM is a suitable technique, on this data, to model the activity of ligands exhibiting diverse bioactivity profiles across a panel of targets, which can range from protein binding sites (a), to cancer cell-lines (b). The camb package constitutes a platform encompassing all steps for the generation of predictive models from chemical structures and their associated bioactivities/properties, which will provide reproducibility and simplify the generation of predictive bioactivity/property models.
van Westen GJP, Wegner JK, Ijzerman AP, van Vlijmen HWT, Bender A: Proteochemometric Modeling as a Tool to Design Selective Compounds and for Extrapolating to Novel Targets. Med Chem Commun. 2011, 2: 16-30. 10.1039/c0md00165a.
Cortes-Ciriano I, Ain QU, Subramanian V, Lenselink EB, Mendez-Lucio O, Ijzerman AP, Wohlfahrt G, Prusis P, Malliavin TE, van Westen GJP, Bender A: Polypharmacology Modelling Using Proteochemometrics (PCM): Recent Methodological Developments, Applications to Target Families, and Future Prospects. Med Chem Commun.
van Westen GJP, Wegner JK, Geluykens P, Kwanten L, Vereycken I, Peeters A, Ijzerman AP, van Vlijmen HWT, Bender A: Which Compound to Select in Lead Optimization? Prospectively Validated Proteochemometric Models Guide Preclinical Development. PLoS ONE. 2011, 6: e27518-10.1371/journal.pone.0027518.
Cortes-Ciriano I, van Westen GJP, Lenselink EB, Murrell DS, Bender A, Malliavin TE: Proteochemometric Modelling in a Bayesian framework. J Cheminf. 2014, 6: 35-10.1186/1758-2946-6-35.
Murrell DS, Cortes-Ciriano I, van Westen GJP, Stott IP, Bender A, Malliavin TE, Glen RC: Chemically Aware Model Builder (camb): An R package for property and bioactivity modeling of small molecules. [http://www.github.com/cambDI/camb]
About this article
Cite this article
Cortes-Ciriano, I., van Westen, G.J., Murrell, D.S. et al. Applications of proteochemometrics - from species extrapolation to cell line sensitivity modelling. BMC Bioinformatics 16 (Suppl 3), A4 (2015). https://doi.org/10.1186/1471-2105-16-S3-A4