Skip to main content

Table 2 Sizes of inputs ad outputs for three different cell lines, and execution times (in minutes) for the TICA query over four cluster configurations

From: PyGMQL: scalable data extraction and analysis for heterogeneous genomic datasets

 

GM12878

HepG2

K562

Input samples

164

224

347

Distinct TFs

116

192

268

Input regions

3,003,121

4,384,181

6,101,933

Output samples

13,454

36,330

71.612

Output regions

109,858,355

213,499,617

381,255,507

Output size (MB)

3,122

6,064.

10,921

1 node e. t. ∗

26.73

73.05

246.85

3 nodes e. t. ∗

10.40

26.28

91.27

5 nodes e. t. ∗

7.21

16.67

59.12

10 nodes e. t. ∗

4.75

9.67

32.92