Skip to main content

Table 4 Running time breakdown, in seconds, of the two FastKmer stages on the 32GB dataset with k fixed to 55 and a decreasing number of executors

From: Analyzing big datasets of genomic sequences: fast and scalable collection of k-mer statistics

 

64 Workers

32 Workers

16 Workers

8 Workers

 

Stage 1

Scheduler Delay time

0

0.1

0.1

0.2

Executor Deserialization time

0

0.4

0.8

2.7

Executor Compute time

293.4

569.7

1152.8

2575.2

Shuffle Read time

0

0

0

0

Shuffle Write time

0.8

1.7

3.3

7.4

Shuffle Read local (MB)

0

0

0

0

Shuffle Read remote (MB)

0

0

0

0

Shuffle Write (MB)

504.7

1009.5

2018.7

4542

 

Stage 2

Scheduler Delay time

0

0

0.1

0.1

Executor Deserialization time

0.2

0.44

0.4

1.4

Executor Compute time

1083

1171.2

2060.2

4556.4

Shuffle Read time

0

2.3

0

0

Shuffle Write time

0

0

0

0

Shuffle Read local (MB)

15.6

62.5

250.6

1125.9

Shuffle Read remote (MB)

484.4

937.9

1749.9

3375.3

Shuffle Write (MB)

0

0

0

0

  1. The table reports also the size, in megabytes, of the corresponding read and write shuffles