Skip to main content

Table 2 Summary performance metrics of goSNAP pipeline

From: A hybrid computational strategy to address WGS variant analysis in >5000 samples

Stage # of core hours Time (days) Data (TB) generated Data (TB) (upload/ download) Median execution time /unit # of parallel execution threads Optimal instance (core/mem)
Slicing /Repack ~48 k/~26 k 5 360 180/0 ~1.15 h/sample (slicing)/~22 h/bin (repacking) 5297 (slicing) 300 (repacking) 8 cores, 16GB/ 4 cores, 4GB
Calling ~1.4 million 14 120 0/2 ~60 hrs/bin 2797 8 cores, 16GB
Genotype Likelihood ~15.6 k 1 6 0/2 ~1.5 h/BAM 5297 2 cores, 8GB
Imputation and Phasing ~3.7 million 30 2 2/2 ~30 h/bin @ Rhea ~13 h/bin @ Blue BioU 265 k 32 cores
Total ~5.2 million 50 488 182/6 -- -- --
  1. The pipeline finished in 50 days and only transferred a total of 6 TB of data starting from a raw data footprint of 180 TB. Cache data of 360 TB was live only for 14 days. Intermediate results amount to 120 and are archived for future use
  2. The pipeline used 5.2 million core hours