Skip to main content

Table 3 Running time breakdown, in seconds, of the two FastKmer stages on the 32GB dataset with k fixed to 28 and a decreasing number of executors

From: Analyzing big datasets of genomic sequences: fast and scalable collection of k-mer statistics

 

64 Workers

32 Workers

16 Workers

8 Workers

 

Stage 1

Scheduler Delay time

0.07

0.08

0.1

0.26

Executor Deserialization time

0.93

1.01

2.15

3.98

Executor Compute time

351.4

580.9

1112.51

2655.48

Shuffle Read time

0

0

0

0

Shuffle Write time

1.22

2.33

4.59

10.42

Shuffle Read local (MB)

0

0

0

0

Shuffle Read remote (MB)

0

0

0

0

Shuffle Write (MB)

504.7

1009.5

2018.7

4542

 

Stage 2

Scheduler Delay time

0.08

0.14

0.07

0.11

Executor Deserialization time

0.19

0.44

1.05

1.82

Executor Compute time

773.52

868.59

1648.76

3859.24

Shuffle Read time

0.06

0

0

0.01

Shuffle Write time

0

0

0

0

Shuffle Read local (MB)

15.6

62.5

250.6

1125.9

Shuffle Read remote (MB)

484.4

937.9

1749.9

3375.3

Shuffle Write (MB)

0

0

0

0

  1. The table reports also the size, in megabytes, of the corresponding read and write shuffles