Skip to main content

Which methods to choose to correct cell types in genome-scale blood-derived DNA methylation data?

Background

High throughput methods such as microarray and DNA-methylation are used to measure the transcriptional variation due to exposures, treatments, phenotypes or clinical outcomes in whole blood, which could be confounded by the cellular heterogeneity[1, 2]. Several algorithms have been developed to measure this cellular heterogeneity. However, it is unknown whether these approaches are consistent, and if not, which method(s) perform better.

Materials and methods

The data implemented in this study were from a Taiwan Maternal and Infant Cohort Study[3, 4]. We compared five cell-type correction methods, including four methods recently proposed: the method implemented in the minfi R package[5], the method by Houseman et al.[6], FaST-LMM-EWASher[7], RefFreeEWAS[8]) and one method using surrogate variables[9] (SVAs). The association of DNA methylation at each CpG site across the whole genome with maternal arsenic exposure levels was assessed adjusting for the estimated cell-types. To further demonstrate and evaluate the methods that do not require reference cell types, we first simulated DNA methylation data at 150 CpG sites across 600 samples based on an association of DNA methylation with a variable of interest (e.g., level of arsenic exposure) and a set of latent variables representing “cell types”. We then simulated DNA methylation at additional CpG sites only showing association with the latent variables.

Results

Only 3 CpG sites showed significant associations with maternal arsenic exposure at a false discovery rate (FDR) level of 0.05, without adjusting for cell types. Adjustment by FaST-LMM-EWASher did not identify any CpG sites. For other methods, Figure 1 illustrates the overlap of identified CpG sites. Further simulation studies on methods free of reference data (i.e., FaST-LMM-EWASher, RefFreeEWAS, and SVA) revealed that RefFreeEWAS and SVA provided good and comparable sensitivities and specificities, and FaST-LMM-EWASher gave the lowest sensitivity but highest specificity (Table 1).

Figure 1
figure 1

Venn diagram illustrating the overlap of significant CpG sites at FDR level of 0.05 after adjusting for cell types by different methods for the association study of maternal arsenic exposure with DNA-methylation.

Table 1 Sensitivity and specificity with respect to truly identified variables using 100 simulated data; CI: confidence interval

Conclusions

The results from real data indicated RefFreeEWAS and SVA were able to identify a large number of CpG sites, and results from SVA showed the highest agreement with all other approaches. Simulation studies further confirmed that RefFreeEWAS and SVA are comparable and perform better than FaST-LMM-EWASher. Overall, the findings support a recommendation of using SVA to adjust for cell types due to its highest agreement with other methods and appealing findings from simulation studies.

References

  1. 1.

    Adalsteinsson BT, Gudnason H, Aspelund T, Harris TB, Launer LJ, Eiriksdottir G, Smith AV, Gudnason V: Heterogeneity in white blood cells has potential to confound DNA methylation measurements. PloS one. 2012, 7 (10): e46705-

    PubMed  CAS  PubMed Central  Article  Google Scholar 

  2. 2.

    Talens RP, Boomsma DI, Tobi EW, Kremer D, Jukema JW, Willemsen G, Putter H, Slagboom PE, Heijmans BT: Variation, patterns, and temporal stability of DNA methylation: considerations for epigenetic epidemiology. FASEB journal : official publication of the Federation of American Societies for Experimental Biology. 2010, 24 (9): 3135-3144.

    CAS  Article  Google Scholar 

  3. 3.

    Lin L-C, Wang S-L, Chang Y-C, Huang P-C, Cheng J-T, Su P-H, Liao P-C: Associations between maternal phthalate exposure and cord sex hormones in human infants. Chemosphere. 2011, 83 (8): 1192-1199.

    PubMed  CAS  Article  Google Scholar 

  4. 4.

    Wang S-L, Su P-H, Jong S-B, Guo YL, Chou W-L, Päpke O: In utero exposure to dioxins and polychlorinated biphenyls and its relations to thyroid function and growth hormone in newborns. Environmental health perspectives. 2005, 1645-1650.

    Google Scholar 

  5. 5.

    Jaffe AE, Irizarry RA: Accounting for cellular heterogeneity is critical in epigenome-wide association studies. Genome biology. 2014, 15 (2): R31-

    PubMed  PubMed Central  Article  Google Scholar 

  6. 6.

    Houseman EA, Accomando WP, Koestler DC, Christensen BC, Marsit CJ, Nelson HH, Wiencke JK, Kelsey KT: DNA methylation arrays as surrogate measures of cell mixture distribution. BMC bioinformatics. 2012, 13: 86-

    PubMed  PubMed Central  Article  Google Scholar 

  7. 7.

    Zou J, Lippert C, Heckerman D, Aryee M, Listgarten J: Epigenome-wide association studies without the need for cell-type composition. Nature methods. 2014, 11 (3): 309-311.

    PubMed  CAS  Article  Google Scholar 

  8. 8.

    Houseman EA, Molitor J, Marsit CJ: Reference-free cell mixture adjustments in analysis of DNA methylation data. Bioinformatics. 2014, 30 (10): 1431-1439.

    PubMed  CAS  PubMed Central  Article  Google Scholar 

  9. 9.

    Leek JT, Storey JD: Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS genetics. 2007, 3 (9): e161-

    PubMed Central  Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Hongmei Zhang.

Rights and permissions

Open Access  This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kaushal, A., Zhang, H., Karmaus, W.J. et al. Which methods to choose to correct cell types in genome-scale blood-derived DNA methylation data?. BMC Bioinformatics 16, P7 (2015). https://doi.org/10.1186/1471-2105-16-S15-P7

Download citation

  • Published:

  • DOI: https://doi.org/10.1186/1471-2105-16-S15-P7

Keywords

  • Simulation Study
  • Latent Variable
  • High Agreement
  • Arsenic Exposure
  • High Throughput Method