Integrative biclustering of heterogeneous datasets using a Bayesian nonparametric model with application to chemogenomics
© Li and Rouchka; licensee BioMed Central Ltd. 2011
Published: 5 August 2011
The identification of protein function and the prediction of ligand-target interaction is an active research field that is facilitated by means of categorizing ligands and proteins into biologically sensible groups. Because of the pharmacological fact that related drugs can bind to receptors without obvious sequence or structural similarity, it is appropriate to categorize proteins based not only on their sequence or structures but also on the chemical structure and the phenotypic side-effect of their ligands. In chemogenomic studies where the complete set of ligands for a protein is not known a priori, integrating the de novo detection of interacting ligand and protein groups into the categorization process can guide the process towards more biologically sensible solutions.
We present the Weighted Infinite Relational Model (WIRM) that jointly detects biologically sensible ligand groups and protein groups by integrating the clustering of various data types including chemical compound descriptors, protein sequences, ligand-target bindings and pharmaceutical effects. WIRM takes advantage of the Bayesian nonparametric paradigm for integrating multiple data types, for allowing for missing values (e.g. unknown ligand-target interaction) in the data, for automatically inferring the number of clusters without explicit model comparison, and for predicting the ligand-target interactions. Because some of these data types, to varying degrees, may suggest relationships having no implication for ligand-target interactions or for biological sensible ligand and protein groups, WIRM allows different types of data to have different weights based on prior knowledge of their quality or relevance.
We apply WIRM to the ion channel proteins and G-protein-coupled receptors. We validate its performance using functional annotation and ligand-target interaction. We also test the relationship among multiple data types by varying the weights which indicate the impact of each data type on the model. The categories and interactions inferred by WIRM both confirm known biology and suggest novel predictions.
This work was supported in part by the National Institutes of Health (P20RR16481, P20RR16481S1, P30ES014443) and the Department of Energy (DE-EM0000197). Its contents are solely the responsibility of the authors and do not represent the official views of NCRR, NIEHS, NIH, or DOE.
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.