Skip to main content
Fig. 2 | BMC Bioinformatics

Fig. 2

From: A novel approach toward optimal workflow selection for DNA methylation biomarker discovery

Fig. 2

The overall design of the tissue-aware simulator (TASA) and the evaluation methods to select the best approach. A Simulation starts from a source tissue data and a target tissue of interest. For identifying candidate DMR region borders, source data is preprocessed and clustered. For both the source (input) and target tissues (output), the reference data from Methbank [33] were used. Two distributions were sampled to mimic the characteristics of the output tissues, and the differences in methylation level between the outputs and inputs were added to the beta-values of probes in the candidate DMR regions. Using this algorithm, 4 different approaches were considered to simulate the target tissues. B Scores of PCA dispersion in four different simulation approaches. Graph 1 shows the dispersion score between simulated and real CD8 samples (GSE59065 [37]), lowest score is the best. Second, the dispersion score between simulated CD8 samples and real Monocyte samples (GSE103541) is shown, with a higher score indicating better performance. Third shows the dispersion score between real monocytes (GSE56046[31], GSE103541 [38]), and a combination of real (GSE59065 [37]) and simulated CD8 samples. A higher score indicates a better performance. C Comparison of cell-type deconvolution percentages between simulated data and matched control dataset on a logarithmic scale. Cell-type percentages were calculated for different CD8 simulation approaches and for a real CD8 control dataset (GSE59065 [37]). Different datasets are represented by different colors

Back to article page