Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: Hierarchical classification-based pan-cancer methylation analysis to classify primary cancer

Fig. 1

Schematic of the study method, architecture of CHCT, and identification of the informative CpG sites by Tukey-kramer test. a Raw methylation cancer data across 30 cancer types were downloaded from TCGA and GEO. After data preprocessing, the data was clustered by UPGMA to divide cancer groups. The first layer classifier labels were established. Then, we firstly used ANOVA test to select the probes with a significant difference. We further used the Tukey-kramer test to screen probes with differences from all other 11 groups. Boruta algorithm was applied to select the most informative CpGs. Finally, we built a predictive model containing these CpGs as features, and the model was tested using test set. The second layer classifier was built similarly. To assess the models’ adaptability, we collected the independent primary cancer methylation data cohort from GEO to test models. b The architecture diagram of CHCT. CHCT can be viewed as a tool with a two-tier architecture. The data to be predicted is first predicted by the first layer of CHCT, and the result of this layer is used to further mobilize the next layer of the prediction model to get the final prediction results (For OV, THYM, and LIHC, the first layer prediction model gives the prediction directly). c The one vs all other approach seeks to screen the CpG sites that distinguish each cancer type from all other cancer types. In this illustration, consider a hypothetical differential consisting of four cancer types. The pairwise differential approach aims to identify the best markers for differentiating each possible cancer pair

Back to article page