Data preprocessing. (1) Only reads overlapping with a CpG on the Infinium 450K chip are retained. (2) Windows are extended to the left and right of each CpG according to the maximum read length, yielding a uniform feature representation. (3) For each CpG, a consensus sequence is formed from its corresponding set of reads. Additionally, the position-specific frequency of each base is extracted. (4) Finally, CpG positions are masked by introducing gaps in the sequence or zeroing frequencies.