- Poster presentation
- Open Access
- Published:
Which methods to choose to correct cell types in genome-scale blood-derived DNA methylation data?
BMC Bioinformatics volume 16, Article number: P7 (2015)
Background
High throughput methods such as microarray and DNA-methylation are used to measure the transcriptional variation due to exposures, treatments, phenotypes or clinical outcomes in whole blood, which could be confounded by the cellular heterogeneity[1, 2]. Several algorithms have been developed to measure this cellular heterogeneity. However, it is unknown whether these approaches are consistent, and if not, which method(s) perform better.
Materials and methods
The data implemented in this study were from a Taiwan Maternal and Infant Cohort Study[3, 4]. We compared five cell-type correction methods, including four methods recently proposed: the method implemented in the minfi R package[5], the method by Houseman et al.[6], FaST-LMM-EWASher[7], RefFreeEWAS[8]) and one method using surrogate variables[9] (SVAs). The association of DNA methylation at each CpG site across the whole genome with maternal arsenic exposure levels was assessed adjusting for the estimated cell-types. To further demonstrate and evaluate the methods that do not require reference cell types, we first simulated DNA methylation data at 150 CpG sites across 600 samples based on an association of DNA methylation with a variable of interest (e.g., level of arsenic exposure) and a set of latent variables representing “cell types”. We then simulated DNA methylation at additional CpG sites only showing association with the latent variables.
Results
Only 3 CpG sites showed significant associations with maternal arsenic exposure at a false discovery rate (FDR) level of 0.05, without adjusting for cell types. Adjustment by FaST-LMM-EWASher did not identify any CpG sites. For other methods, Figure 1 illustrates the overlap of identified CpG sites. Further simulation studies on methods free of reference data (i.e., FaST-LMM-EWASher, RefFreeEWAS, and SVA) revealed that RefFreeEWAS and SVA provided good and comparable sensitivities and specificities, and FaST-LMM-EWASher gave the lowest sensitivity but highest specificity (Table 1).
Conclusions
The results from real data indicated RefFreeEWAS and SVA were able to identify a large number of CpG sites, and results from SVA showed the highest agreement with all other approaches. Simulation studies further confirmed that RefFreeEWAS and SVA are comparable and perform better than FaST-LMM-EWASher. Overall, the findings support a recommendation of using SVA to adjust for cell types due to its highest agreement with other methods and appealing findings from simulation studies.
References
Adalsteinsson BT, Gudnason H, Aspelund T, Harris TB, Launer LJ, Eiriksdottir G, Smith AV, Gudnason V: Heterogeneity in white blood cells has potential to confound DNA methylation measurements. PloS one. 2012, 7 (10): e46705-
Talens RP, Boomsma DI, Tobi EW, Kremer D, Jukema JW, Willemsen G, Putter H, Slagboom PE, Heijmans BT: Variation, patterns, and temporal stability of DNA methylation: considerations for epigenetic epidemiology. FASEB journal : official publication of the Federation of American Societies for Experimental Biology. 2010, 24 (9): 3135-3144.
Lin L-C, Wang S-L, Chang Y-C, Huang P-C, Cheng J-T, Su P-H, Liao P-C: Associations between maternal phthalate exposure and cord sex hormones in human infants. Chemosphere. 2011, 83 (8): 1192-1199.
Wang S-L, Su P-H, Jong S-B, Guo YL, Chou W-L, Päpke O: In utero exposure to dioxins and polychlorinated biphenyls and its relations to thyroid function and growth hormone in newborns. Environmental health perspectives. 2005, 1645-1650.
Jaffe AE, Irizarry RA: Accounting for cellular heterogeneity is critical in epigenome-wide association studies. Genome biology. 2014, 15 (2): R31-
Houseman EA, Accomando WP, Koestler DC, Christensen BC, Marsit CJ, Nelson HH, Wiencke JK, Kelsey KT: DNA methylation arrays as surrogate measures of cell mixture distribution. BMC bioinformatics. 2012, 13: 86-
Zou J, Lippert C, Heckerman D, Aryee M, Listgarten J: Epigenome-wide association studies without the need for cell-type composition. Nature methods. 2014, 11 (3): 309-311.
Houseman EA, Molitor J, Marsit CJ: Reference-free cell mixture adjustments in analysis of DNA methylation data. Bioinformatics. 2014, 30 (10): 1431-1439.
Leek JT, Storey JD: Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS genetics. 2007, 3 (9): e161-
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.
The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Kaushal, A., Zhang, H., Karmaus, W.J. et al. Which methods to choose to correct cell types in genome-scale blood-derived DNA methylation data?. BMC Bioinformatics 16 (Suppl 15), P7 (2015). https://doi.org/10.1186/1471-2105-16-S15-P7
Published:
DOI: https://doi.org/10.1186/1471-2105-16-S15-P7
Keywords
- Simulation Study
- Latent Variable
- High Agreement
- Arsenic Exposure
- High Throughput Method