Computational purification of tumor gene expression data
© Deshwar et al; licensee BioMed Central Ltd. 2011
Published: 21 November 2011
Cancer gene expression profiling is an indispensable tool for identifying drivers of tumor progression, identifying subtypes, and predicting clinical outcome. An outstanding challenge faced by cancer gene expression studies is the limited concordance between studies , driven in part by lack of statistical power . Part of this lack of statistical power is due to the fact that tumor samples from some solid cancers contain between 30%-70% healthy tissue . This healthy tissue contaminates tumor expression profiles and variable amounts of healthy tissue leads to increased variability between tumor expression profiles. Physical purification of these tumor samples before profiling is often not feasible.
Materials and methods
We have developed ISOpure , a computational method to purify tumor gene expression profiles using reference samples of healthy tissue to model the contribution of healthy tissue. For every tumor expression profile in the input, ISOpure estimates the percentage of cancerous tissue and outputs a purified cancer expression profile from which the impact of healthy tissue has been removed. We verified our purification procedure by measuring the performance of expression-based predictive models of patient outcome in cancer, using either the original or ISOpure-purified expression profiles. We predicted extraprostatic extension (EPE) in 89 prostate tumor samples and patient survival for a set of 443 lung cancer patients.
Results and conclusions
We have demonstrated that ISOpure improves our ability to predict patient phenotype based on gene expression, and expect to see similar improvements for other cancer gene expression analyses such as subtype identification and classification. We are currently generating a compendium of purified gene expression profiles from 1600 tumor samples representing 15 different types of solid cancer using archival data from GEO. We are excited to work with the community at large to generate a resource of computationally purified cancer datasets, in order to facilitate more accurate analysis of cancer gene expression.
- Boutros PC, Lau SK, Pintilie M, Liu N, Shepherd FA, Der SD, Tsao MS, Penn LZ, Jurisica I: Prognostic gene signatures for non-small-cell lung cancer. Proc Natl Acad Sci USA 2009, 106(8):2824–2828. 10.1073/pnas.0809444106PubMed CentralView ArticlePubMedGoogle Scholar
- Ein-Dor L, Zuk O, Domany E: Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Natl Acad Sci USA 2006, 103(15):5923–5928. 10.1073/pnas.0601231103PubMed CentralView ArticlePubMedGoogle Scholar
- Wang Y, Xia XQ, Jia Z, Sawyers A, Yao H, Wang-Rodriquez J, Mercola D, McClelland M: In silico estimates of tissue components in surgical samples based on expression profiling data. Cancer Res 2010, 70(16):6448–6455. 10.1158/0008-5472.CAN-10-0021PubMed CentralView ArticlePubMedGoogle Scholar
- Quon C, Haider S, Deshwar AG, Cui A, Boutros PC, Morris QD: Patient-specific computational purification of gene expression profiles. Nature Biotechnology 2011. in review in reviewGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.