Skip to main content

Table 1 For each dataset (on columns) we report the number of samples per class, as well as the cardinality of the dataset composed solely by enhancers and promoters (rows “Total E+P”) for genome version hg19 (Top table) and genome version hg38 (Bottom table)

From: Boosting tissue-specific prediction of active cis-regulatory regions through deep learning and Bayesian optimization techniques

Genome version

Labels

HepG2

K562

GM12878

Total

HelaS3

hg19

Active enhancer (AE)

1465

894

2878

5237

1847

Inactive enhancer (IE)

34,556

34,392

28,156

97,104

32,179

Active promoter (AP)

11,467

10,076

10,816

32,359

10,759

Inactive promoter (IP)

96,184

82,829

73,891

252,904

79,004

Total E + P

143,672

128,191

115,741

387,604

123,789

Active exon (AX)

9931

9033

8226

9123

 

Inactive exon (IX)

19,071

20,261

19,078

22,071

 

Unknown (UK)

79,417

78,081

80,004

81,502

 

Total

25,209

235,566

223,049

236,485

 

hg38

Active enhancer (AE)

7177

5524

11,589

24,290

 

Inactive enhancer (IE)

56,108

57761

51,696

165,565

 

Active promoter (AP)

14,092

12,524

14,036

40,652

 

Inactive promoter (IP)

85,789

87,357

85,845

258,991

 

Total E + P

163,166

163,166

163,166

489,498

 
  1. Column “Total” allows comparing the total cardinality of CRRs across the hg19 and the hg38-datasets. Since we also have non-CRRs regions for genome version hg19, row “Total” in the top table reports the total number of samples per cell line in the hg19 dataset