Skip to main content

Table 2 DMR prediction performance of hybrid model learned from all training data

From: Predicting environmentally responsive transgenerational differential DNA methylated regions (epimutations) in the genome using a hybrid deep-machine learning approach

Chr #Predicted DMR %Recall #max poss. DMRs %max poss. DMRs %Genome
1 127,816 96.09 267,040 95.03 45.54
2 141,619 85.88 250,909 94.68 53.44
3 105,790 92.90 168,257 94.52 60.10
4 159,974 97.34 152,775 95.49 87.41
5 74,363 94.15 164,461 95.61 43.23
6 103,072 93.77 139,443 95.50 70.59
7 76,805 83.48 137,363 95.39 53.33
8 57,713 97.74 127,323 96.45 43.72
9 73,307 96.03 115,863 95.75 60.58
10 93,641 87.19 108,271 97.54 84.36
11 39,174 87.54 85,854 96.46 44.01
12 43,108 91.82 110,497 97.78 84.52
13 44,317 96.31 108,528 95.00 39.21
14 62,311 94.63 104,104 94.64 54.65
15 96,840 74.41 84,065 94.45 88.03
16 62,548 94.86 84,713 95.18 70.27
17 71,651 98.56 83,393 95.85 80.50
18 53,671 98.38 83,408 95.87 61.69
19 51,335 88.20 58,891 96.54 84.18
20 18,205 98.33 47,449 96.83 37.12
X 47,092 68.45 144,335 91.35 2.98
Y 2608 91.13 3159 95.43 85.94
ALL 1,748,888 95.49 2,742,978 95.40 63.75
  1. The number of DMRs in a chromosome predicted by the hybrid model trained on data from that chromosome, and the number of DMRs predicted across the whole genome (ALL) by the hybrid model trained on data from the whole genome (#Predicted DMR). Also shown is the percentage recall (%Recall), which is the percentage of the training DMRs that the model correctly predicts as DMRs. As a comparison, “maximum possible DMRs” is defined as the set of all 1000 bp regions minus those regions that are clearly nonDMRs, because they have no CpGs or more than 20% (200) CpGs. The size of this “maximum possible DMRs” set serves as an upperbound on the number of possible DMRs, and the number of predicted DMRs should be well below this bound. The table shows the size of this set (# max poss. DMRs) and the percentage of the chromosome or whole genome this set represents (% max poss. DMRs). The %Genome column shows the percent of the chromosome, or entire genome for ALL, that the predicted DMRs represent. The %Genome value should be well below the “% max poss. DMRs” value