Skip to main content
  • Poster presentation
  • Open access
  • Published:

Which methods to choose to correct cell types in genome-scale blood-derived DNA methylation data?


High throughput methods such as microarray and DNA-methylation are used to measure the transcriptional variation due to exposures, treatments, phenotypes or clinical outcomes in whole blood, which could be confounded by the cellular heterogeneity[1, 2]. Several algorithms have been developed to measure this cellular heterogeneity. However, it is unknown whether these approaches are consistent, and if not, which method(s) perform better.

Materials and methods

The data implemented in this study were from a Taiwan Maternal and Infant Cohort Study[3, 4]. We compared five cell-type correction methods, including four methods recently proposed: the method implemented in the minfi R package[5], the method by Houseman et al.[6], FaST-LMM-EWASher[7], RefFreeEWAS[8]) and one method using surrogate variables[9] (SVAs). The association of DNA methylation at each CpG site across the whole genome with maternal arsenic exposure levels was assessed adjusting for the estimated cell-types. To further demonstrate and evaluate the methods that do not require reference cell types, we first simulated DNA methylation data at 150 CpG sites across 600 samples based on an association of DNA methylation with a variable of interest (e.g., level of arsenic exposure) and a set of latent variables representing “cell types”. We then simulated DNA methylation at additional CpG sites only showing association with the latent variables.


Only 3 CpG sites showed significant associations with maternal arsenic exposure at a false discovery rate (FDR) level of 0.05, without adjusting for cell types. Adjustment by FaST-LMM-EWASher did not identify any CpG sites. For other methods, Figure 1 illustrates the overlap of identified CpG sites. Further simulation studies on methods free of reference data (i.e., FaST-LMM-EWASher, RefFreeEWAS, and SVA) revealed that RefFreeEWAS and SVA provided good and comparable sensitivities and specificities, and FaST-LMM-EWASher gave the lowest sensitivity but highest specificity (Table 1).

Figure 1
figure 1

Venn diagram illustrating the overlap of significant CpG sites at FDR level of 0.05 after adjusting for cell types by different methods for the association study of maternal arsenic exposure with DNA-methylation.

Table 1 Sensitivity and specificity with respect to truly identified variables using 100 simulated data; CI: confidence interval


The results from real data indicated RefFreeEWAS and SVA were able to identify a large number of CpG sites, and results from SVA showed the highest agreement with all other approaches. Simulation studies further confirmed that RefFreeEWAS and SVA are comparable and perform better than FaST-LMM-EWASher. Overall, the findings support a recommendation of using SVA to adjust for cell types due to its highest agreement with other methods and appealing findings from simulation studies.


  1. Adalsteinsson BT, Gudnason H, Aspelund T, Harris TB, Launer LJ, Eiriksdottir G, Smith AV, Gudnason V: Heterogeneity in white blood cells has potential to confound DNA methylation measurements. PloS one. 2012, 7 (10): e46705-

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  2. Talens RP, Boomsma DI, Tobi EW, Kremer D, Jukema JW, Willemsen G, Putter H, Slagboom PE, Heijmans BT: Variation, patterns, and temporal stability of DNA methylation: considerations for epigenetic epidemiology. FASEB journal : official publication of the Federation of American Societies for Experimental Biology. 2010, 24 (9): 3135-3144.

    Article  CAS  Google Scholar 

  3. Lin L-C, Wang S-L, Chang Y-C, Huang P-C, Cheng J-T, Su P-H, Liao P-C: Associations between maternal phthalate exposure and cord sex hormones in human infants. Chemosphere. 2011, 83 (8): 1192-1199.

    Article  PubMed  CAS  Google Scholar 

  4. Wang S-L, Su P-H, Jong S-B, Guo YL, Chou W-L, Päpke O: In utero exposure to dioxins and polychlorinated biphenyls and its relations to thyroid function and growth hormone in newborns. Environmental health perspectives. 2005, 1645-1650.

    Google Scholar 

  5. Jaffe AE, Irizarry RA: Accounting for cellular heterogeneity is critical in epigenome-wide association studies. Genome biology. 2014, 15 (2): R31-

    Article  PubMed  PubMed Central  Google Scholar 

  6. Houseman EA, Accomando WP, Koestler DC, Christensen BC, Marsit CJ, Nelson HH, Wiencke JK, Kelsey KT: DNA methylation arrays as surrogate measures of cell mixture distribution. BMC bioinformatics. 2012, 13: 86-

    Article  PubMed  PubMed Central  Google Scholar 

  7. Zou J, Lippert C, Heckerman D, Aryee M, Listgarten J: Epigenome-wide association studies without the need for cell-type composition. Nature methods. 2014, 11 (3): 309-311.

    Article  PubMed  CAS  Google Scholar 

  8. Houseman EA, Molitor J, Marsit CJ: Reference-free cell mixture adjustments in analysis of DNA methylation data. Bioinformatics. 2014, 30 (10): 1431-1439.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  9. Leek JT, Storey JD: Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS genetics. 2007, 3 (9): e161-

    Article  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Hongmei Zhang.

Rights and permissions

Open Access  This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit

The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kaushal, A., Zhang, H., Karmaus, W.J. et al. Which methods to choose to correct cell types in genome-scale blood-derived DNA methylation data?. BMC Bioinformatics 16 (Suppl 15), P7 (2015).

Download citation

  • Published:

  • DOI: