Skip to content


  • Oral presentation
  • Open Access

Computational purification of tumor gene expression data

  • 1,
  • 2 and
  • 3
BMC Bioinformatics201112 (Suppl 11) :A9

  • Published:


  • Gene Expression Profile
  • Healthy Tissue
  • Prognostic Model
  • Solid Cancer
  • Physical Purification


Cancer gene expression profiling is an indispensable tool for identifying drivers of tumor progression, identifying subtypes, and predicting clinical outcome. An outstanding challenge faced by cancer gene expression studies is the limited concordance between studies [1], driven in part by lack of statistical power [2]. Part of this lack of statistical power is due to the fact that tumor samples from some solid cancers contain between 30%-70% healthy tissue [3]. This healthy tissue contaminates tumor expression profiles and variable amounts of healthy tissue leads to increased variability between tumor expression profiles. Physical purification of these tumor samples before profiling is often not feasible.

Materials and methods

We have developed ISOpure [4], a computational method to purify tumor gene expression profiles using reference samples of healthy tissue to model the contribution of healthy tissue. For every tumor expression profile in the input, ISOpure estimates the percentage of cancerous tissue and outputs a purified cancer expression profile from which the impact of healthy tissue has been removed. We verified our purification procedure by measuring the performance of expression-based predictive models of patient outcome in cancer, using either the original or ISOpure-purified expression profiles. We predicted extraprostatic extension (EPE) in 89 prostate tumor samples and patient survival for a set of 443 lung cancer patients.

Results and conclusions

Purified expression profiles showed significant improvements in prognostic model performance. 93% of the EPE classifiers constructed using the purified profiles had higher accuracy on held-out data in cross-validation than the matching classifier trained using the original expression data (p = 1.58x10-77), with an average improvement of 11% in performance (Fig. 1). For lung cancer, the prognostic model based on the purified profiles improved hazard modeling by 39% over the model based on the unpurified profiles (p = 0.016).
Figure 1
Figure 1

Density estimate of EPE classifier accuracy using purified and original expression profiles.

We have demonstrated that ISOpure improves our ability to predict patient phenotype based on gene expression, and expect to see similar improvements for other cancer gene expression analyses such as subtype identification and classification. We are currently generating a compendium of purified gene expression profiles from 1600 tumor samples representing 15 different types of solid cancer using archival data from GEO. We are excited to work with the community at large to generate a resource of computationally purified cancer datasets, in order to facilitate more accurate analysis of cancer gene expression.

Authors’ Affiliations

Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada
Department of Computer Science, University of Toronto, Toronto, Canada
Banting and Best Department of Medical Research, University of Toronto, Toronto, Canada


  1. Boutros PC, Lau SK, Pintilie M, Liu N, Shepherd FA, Der SD, Tsao MS, Penn LZ, Jurisica I: Prognostic gene signatures for non-small-cell lung cancer. Proc Natl Acad Sci USA 2009, 106(8):2824–2828. 10.1073/pnas.0809444106PubMed CentralView ArticlePubMedGoogle Scholar
  2. Ein-Dor L, Zuk O, Domany E: Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Natl Acad Sci USA 2006, 103(15):5923–5928. 10.1073/pnas.0601231103PubMed CentralView ArticlePubMedGoogle Scholar
  3. Wang Y, Xia XQ, Jia Z, Sawyers A, Yao H, Wang-Rodriquez J, Mercola D, McClelland M: In silico estimates of tissue components in surgical samples based on expression profiling data. Cancer Res 2010, 70(16):6448–6455. 10.1158/0008-5472.CAN-10-0021PubMed CentralView ArticlePubMedGoogle Scholar
  4. Quon C, Haider S, Deshwar AG, Cui A, Boutros PC, Morris QD: Patient-specific computational purification of gene expression profiles. Nature Biotechnology 2011. in review in reviewGoogle Scholar


© Deshwar et al; licensee BioMed Central Ltd. 2011

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.