Skip to main content

Table 1 Mercury computational resource requirements

From: Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline

 

Data gen

Alignment

BAM finishing

Variants

Anno

 

BCL to FastQ

BWA align

BWA sample

Mates, Dupe, Stats

Cap & Cvrg Metrics

GATK indel targets

GATK indel realign

GATK recal

BAM valid

Atlas SNP

Atlas Indel

Cassandra

Nodes

1

1

0.333

0.5

0.125

1

0.333

1

0.125

0.167

0.167

0.167

RAM

48

48

15

28

3

48

14

32

4

7

7

8

Hours

3.62

1.84

1.38

3.39

1.30

0.28

2.25

3.04

0.75

9.00

7.51

1.71

Node*hrs

3.62

1.84

0.46

1.70

0.16

0.28

0.75

3.04

0.09

1.50

1.25

0.29

  1. All estimates are approximate for whole exome and light-skim whole genome (~10-20 Gbp of data) sequenced on Illumina HiSeq and processed with the most recent versions of RTA and Casava. Nodes are 8-core, 48 GB RAM, with ~3 GHz Intel CPUs and ~1 TB of local scratch disk. Steps include all aspects of the pipeline from building reads and qualities (fastQ) from raw data (bcl files), through alignment and BAM generation using the BWA aligner, and BAM finishing with GATK post-processing and duplicate marking, capture and coverage metric calculation, and BAM file validation, finally producing variants from the Atlas2 variant calling suite with annotations from our annotator, Cassandra.