Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: Comprehensive study of semi-supervised learning for DNA methylation-based supervised classification of central nervous system tumors

Fig. 1

Training and testing scheme for the 11 evaluated SSL models. After preprocessing the GSE90496 methylation data, probes with a standard deviation greater than 0.3 across all 2,801 samples were selected as features used in the 11 SSL models. Thirty percent of the samples were kept aside as an inductive testing data set to independently evaluate the performance of each SSL model. The remaining 70% of the data was used as training sets. The training data were proportionally partitioned into a labeled and an unlabeled set. Specifically, 50% (of the 70%) training data were used as labeled examples, while the remaining 50% (of the 70%) data were used as unlabeled examples or as a transductive test set. The partitioning process was bootstrapped seven times. Each molecular methylation group was proportionally selected for every bootstrap to ensure that the class distributions were similar to the original class distributions

Back to article page