Skip to main content

Table 1 Mercury computational resource requirements

From: Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline

  Data gen Alignment BAM finishing Variants Anno
  BCL to FastQ BWA align BWA sample Mates, Dupe, Stats Cap & Cvrg Metrics GATK indel targets GATK indel realign GATK recal BAM valid Atlas SNP Atlas Indel Cassandra
Nodes 1 1 0.333 0.5 0.125 1 0.333 1 0.125 0.167 0.167 0.167
RAM 48 48 15 28 3 48 14 32 4 7 7 8
Hours 3.62 1.84 1.38 3.39 1.30 0.28 2.25 3.04 0.75 9.00 7.51 1.71
Node*hrs 3.62 1.84 0.46 1.70 0.16 0.28 0.75 3.04 0.09 1.50 1.25 0.29
  1. All estimates are approximate for whole exome and light-skim whole genome (~10-20 Gbp of data) sequenced on Illumina HiSeq and processed with the most recent versions of RTA and Casava. Nodes are 8-core, 48 GB RAM, with ~3 GHz Intel CPUs and ~1 TB of local scratch disk. Steps include all aspects of the pipeline from building reads and qualities (fastQ) from raw data (bcl files), through alignment and BAM generation using the BWA aligner, and BAM finishing with GATK post-processing and duplicate marking, capture and coverage metric calculation, and BAM file validation, finally producing variants from the Atlas2 variant calling suite with annotations from our annotator, Cassandra.
\