Skip to main content

Table 2 DMR prediction performance of hybrid model learned from all training data

From: Predicting environmentally responsive transgenerational differential DNA methylated regions (epimutations) in the genome using a hybrid deep-machine learning approach

Chr

#Predicted DMR

%Recall

#max poss. DMRs

%max poss. DMRs

%Genome

1

127,816

96.09

267,040

95.03

45.54

2

141,619

85.88

250,909

94.68

53.44

3

105,790

92.90

168,257

94.52

60.10

4

159,974

97.34

152,775

95.49

87.41

5

74,363

94.15

164,461

95.61

43.23

6

103,072

93.77

139,443

95.50

70.59

7

76,805

83.48

137,363

95.39

53.33

8

57,713

97.74

127,323

96.45

43.72

9

73,307

96.03

115,863

95.75

60.58

10

93,641

87.19

108,271

97.54

84.36

11

39,174

87.54

85,854

96.46

44.01

12

43,108

91.82

110,497

97.78

84.52

13

44,317

96.31

108,528

95.00

39.21

14

62,311

94.63

104,104

94.64

54.65

15

96,840

74.41

84,065

94.45

88.03

16

62,548

94.86

84,713

95.18

70.27

17

71,651

98.56

83,393

95.85

80.50

18

53,671

98.38

83,408

95.87

61.69

19

51,335

88.20

58,891

96.54

84.18

20

18,205

98.33

47,449

96.83

37.12

X

47,092

68.45

144,335

91.35

2.98

Y

2608

91.13

3159

95.43

85.94

ALL

1,748,888

95.49

2,742,978

95.40

63.75

  1. The number of DMRs in a chromosome predicted by the hybrid model trained on data from that chromosome, and the number of DMRs predicted across the whole genome (ALL) by the hybrid model trained on data from the whole genome (#Predicted DMR). Also shown is the percentage recall (%Recall), which is the percentage of the training DMRs that the model correctly predicts as DMRs. As a comparison, “maximum possible DMRs” is defined as the set of all 1000 bp regions minus those regions that are clearly nonDMRs, because they have no CpGs or more than 20% (200) CpGs. The size of this “maximum possible DMRs” set serves as an upperbound on the number of possible DMRs, and the number of predicted DMRs should be well below this bound. The table shows the size of this set (# max poss. DMRs) and the percentage of the chromosome or whole genome this set represents (% max poss. DMRs). The %Genome column shows the percent of the chromosome, or entire genome for ALL, that the predicted DMRs represent. The %Genome value should be well below the “% max poss. DMRs” value