Skip to main content

Table 2 Top Table: Unbalanced setup (first to fifth column): for each dataset and each task (first to fifth row from top to bottom) executed on data from genome version hg19, we report the class unbalancing ratio, computed as the ratio between the cardinality of the most-represented class and the cardinality of the less-represented class. Bottom table: the unbalancing ratios describing the unbalancing for data in hg38 are shown

From: Boosting tissue-specific prediction of active cis-regulatory regions through deep learning and Bayesian optimization techniques

Genome version

Task

Unbalancing ratios for different setups

  

Unbalanced setup

Full-balanced setup [34]

  

HepG2

HelaS3

K562

GM12878

Average

All cell lines

hg19

IE versus IP

2.78

2.46

2.41

2.62

2.57

1

AP versus IP

8.39

7.34

8.22

6.83

7.70

2

AE versus IE

23.59

17.42

38.47

9.78

22.32

2

AE versus AP

7.83

5.83

11.27

3.76

7.17

1

AE + AP versus else

18.49

17.76

20.47

15.29

18.00

8

Avg per cell line

12.22

10.16

16.17

7.66

11.55

2.8

  

Unbalanced setup

 
  

HepG2

K562

GM12878

Average

  

hg38

IE versus IP

1.53

1.51

1.66

1.57

  

AP versus IP

6.09

6.98

6.12

6.39

  

AE versus IE

7.82

10.46

4.46

7.58

  

AE versus AP

1.96

2.27

1.21

1.81

  

Avg per cell line

4.35

5.30

3.36

4.34

  
  1. The fifth column (Average) shows the average unbalancing ratio over all the four cell lines, when the unbalanced setup is used. Task AE versus IE and task AE + AP versus else are, on average, the most unbalanced. Full-balanced setup (sixth column): the unbalancing ratio in each task is equal for all the cell lines. The comparison between the averages (over each cell lines) of the unbalancing factors (fifth and sixth columns) shows the striking difference between the two unbalancing modes.