Skip to main content

Table 2 Summary performance metrics of goSNAP pipeline

From: A hybrid computational strategy to address WGS variant analysis in >5000 samples

Stage

# of core hours

Time (days)

Data (TB) generated

Data (TB) (upload/ download)

Median execution time /unit

# of parallel execution threads

Optimal instance (core/mem)

Slicing /Repack

~48 k/~26 k

5

360

180/0

~1.15 h/sample (slicing)/~22 h/bin (repacking)

5297 (slicing) 300 (repacking)

8 cores, 16GB/ 4 cores, 4GB

Calling

~1.4 million

14

120

0/2

~60 hrs/bin

2797

8 cores, 16GB

Genotype Likelihood

~15.6 k

1

6

0/2

~1.5 h/BAM

5297

2 cores, 8GB

Imputation and Phasing

~3.7 million

30

2

2/2

~30 h/bin @ Rhea ~13 h/bin @ Blue BioU

265 k

32 cores

Total

~5.2 million

50

488

182/6

--

--

--

  1. The pipeline finished in 50 days and only transferred a total of 6 TB of data starting from a raw data footprint of 180 TB. Cache data of 360 TB was live only for 14 days. Intermediate results amount to 120 and are archived for future use
  2. The pipeline used 5.2 million core hours