- Oral presentation
- Open Access
Computational purification of tumor gene expression data
BMC Bioinformatics volume 12, Article number: A9 (2011)
Cancer gene expression profiling is an indispensable tool for identifying drivers of tumor progression, identifying subtypes, and predicting clinical outcome. An outstanding challenge faced by cancer gene expression studies is the limited concordance between studies , driven in part by lack of statistical power . Part of this lack of statistical power is due to the fact that tumor samples from some solid cancers contain between 30%-70% healthy tissue . This healthy tissue contaminates tumor expression profiles and variable amounts of healthy tissue leads to increased variability between tumor expression profiles. Physical purification of these tumor samples before profiling is often not feasible.
Materials and methods
We have developed ISOpure , a computational method to purify tumor gene expression profiles using reference samples of healthy tissue to model the contribution of healthy tissue. For every tumor expression profile in the input, ISOpure estimates the percentage of cancerous tissue and outputs a purified cancer expression profile from which the impact of healthy tissue has been removed. We verified our purification procedure by measuring the performance of expression-based predictive models of patient outcome in cancer, using either the original or ISOpure-purified expression profiles. We predicted extraprostatic extension (EPE) in 89 prostate tumor samples and patient survival for a set of 443 lung cancer patients.
Results and conclusions
Purified expression profiles showed significant improvements in prognostic model performance. 93% of the EPE classifiers constructed using the purified profiles had higher accuracy on held-out data in cross-validation than the matching classifier trained using the original expression data (p = 1.58x10-77), with an average improvement of 11% in performance (Fig. 1). For lung cancer, the prognostic model based on the purified profiles improved hazard modeling by 39% over the model based on the unpurified profiles (p = 0.016).
We have demonstrated that ISOpure improves our ability to predict patient phenotype based on gene expression, and expect to see similar improvements for other cancer gene expression analyses such as subtype identification and classification. We are currently generating a compendium of purified gene expression profiles from 1600 tumor samples representing 15 different types of solid cancer using archival data from GEO. We are excited to work with the community at large to generate a resource of computationally purified cancer datasets, in order to facilitate more accurate analysis of cancer gene expression.
Boutros PC, Lau SK, Pintilie M, Liu N, Shepherd FA, Der SD, Tsao MS, Penn LZ, Jurisica I: Prognostic gene signatures for non-small-cell lung cancer. Proc Natl Acad Sci USA 2009, 106(8):2824–2828. 10.1073/pnas.0809444106
Ein-Dor L, Zuk O, Domany E: Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Natl Acad Sci USA 2006, 103(15):5923–5928. 10.1073/pnas.0601231103
Wang Y, Xia XQ, Jia Z, Sawyers A, Yao H, Wang-Rodriquez J, Mercola D, McClelland M: In silico estimates of tissue components in surgical samples based on expression profiling data. Cancer Res 2010, 70(16):6448–6455. 10.1158/0008-5472.CAN-10-0021
Quon C, Haider S, Deshwar AG, Cui A, Boutros PC, Morris QD: Patient-specific computational purification of gene expression profiles. Nature Biotechnology 2011. in review in review
About this article
Cite this article
Deshwar, A., Quon, G. & Morris, Q. Computational purification of tumor gene expression data. BMC Bioinformatics 12, A9 (2011). https://doi.org/10.1186/1471-2105-12-S11-A9
- Gene Expression Profile
- Healthy Tissue
- Prognostic Model
- Solid Cancer
- Physical Purification