Volume 16 Supplement 15
Which methods to choose to correct cell types in genome-scale blood-derived DNA methylation data?
© Kaushal et al. 2015
Published: 23 October 2015
High throughput methods such as microarray and DNA-methylation are used to measure the transcriptional variation due to exposures, treatments, phenotypes or clinical outcomes in whole blood, which could be confounded by the cellular heterogeneity[1, 2]. Several algorithms have been developed to measure this cellular heterogeneity. However, it is unknown whether these approaches are consistent, and if not, which method(s) perform better.
Materials and methods
The data implemented in this study were from a Taiwan Maternal and Infant Cohort Study[3, 4]. We compared five cell-type correction methods, including four methods recently proposed: the method implemented in the minfi R package, the method by Houseman et al., FaST-LMM-EWASher, RefFreeEWAS) and one method using surrogate variables (SVAs). The association of DNA methylation at each CpG site across the whole genome with maternal arsenic exposure levels was assessed adjusting for the estimated cell-types. To further demonstrate and evaluate the methods that do not require reference cell types, we first simulated DNA methylation data at 150 CpG sites across 600 samples based on an association of DNA methylation with a variable of interest (e.g., level of arsenic exposure) and a set of latent variables representing “cell types”. We then simulated DNA methylation at additional CpG sites only showing association with the latent variables.
Sensitivity and specificity with respect to truly identified variables using 100 simulated data; CI: confidence interval
Sensitivity: Median (95% CI)
Specificity: Median (95% CI)
0.00 (0.00, 0.52)
1.00 (0.99, 1.00)
0.98 (0.00, 1.00)
0.94 (0.93, 1.00)
1.00 (0.98, 1.00)
0.94 (0.93, 0.94)
The results from real data indicated RefFreeEWAS and SVA were able to identify a large number of CpG sites, and results from SVA showed the highest agreement with all other approaches. Simulation studies further confirmed that RefFreeEWAS and SVA are comparable and perform better than FaST-LMM-EWASher. Overall, the findings support a recommendation of using SVA to adjust for cell types due to its highest agreement with other methods and appealing findings from simulation studies.
- Adalsteinsson BT, Gudnason H, Aspelund T, Harris TB, Launer LJ, Eiriksdottir G, Smith AV, Gudnason V: Heterogeneity in white blood cells has potential to confound DNA methylation measurements. PloS one. 2012, 7 (10): e46705-PubMedPubMed CentralView ArticleGoogle Scholar
- Talens RP, Boomsma DI, Tobi EW, Kremer D, Jukema JW, Willemsen G, Putter H, Slagboom PE, Heijmans BT: Variation, patterns, and temporal stability of DNA methylation: considerations for epigenetic epidemiology. FASEB journal : official publication of the Federation of American Societies for Experimental Biology. 2010, 24 (9): 3135-3144.View ArticleGoogle Scholar
- Lin L-C, Wang S-L, Chang Y-C, Huang P-C, Cheng J-T, Su P-H, Liao P-C: Associations between maternal phthalate exposure and cord sex hormones in human infants. Chemosphere. 2011, 83 (8): 1192-1199.PubMedView ArticleGoogle Scholar
- Wang S-L, Su P-H, Jong S-B, Guo YL, Chou W-L, Päpke O: In utero exposure to dioxins and polychlorinated biphenyls and its relations to thyroid function and growth hormone in newborns. Environmental health perspectives. 2005, 1645-1650.Google Scholar
- Jaffe AE, Irizarry RA: Accounting for cellular heterogeneity is critical in epigenome-wide association studies. Genome biology. 2014, 15 (2): R31-PubMedPubMed CentralView ArticleGoogle Scholar
- Houseman EA, Accomando WP, Koestler DC, Christensen BC, Marsit CJ, Nelson HH, Wiencke JK, Kelsey KT: DNA methylation arrays as surrogate measures of cell mixture distribution. BMC bioinformatics. 2012, 13: 86-PubMedPubMed CentralView ArticleGoogle Scholar
- Zou J, Lippert C, Heckerman D, Aryee M, Listgarten J: Epigenome-wide association studies without the need for cell-type composition. Nature methods. 2014, 11 (3): 309-311.PubMedView ArticleGoogle Scholar
- Houseman EA, Molitor J, Marsit CJ: Reference-free cell mixture adjustments in analysis of DNA methylation data. Bioinformatics. 2014, 30 (10): 1431-1439.PubMedPubMed CentralView ArticleGoogle Scholar
- Leek JT, Storey JD: Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS genetics. 2007, 3 (9): e161-PubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.