| Data gen | Alignment | BAM finishing | Variants | Anno |
---|
 | BCL to FastQ | BWA align | BWA sample | Mates, Dupe, Stats | Cap & Cvrg Metrics | GATK indel targets | GATK indel realign | GATK recal | BAM valid | Atlas SNP | Atlas Indel | Cassandra |
---|
Nodes | 1 | 1 | 0.333 | 0.5 | 0.125 | 1 | 0.333 | 1 | 0.125 | 0.167 | 0.167 | 0.167 |
RAM | 48 | 48 | 15 | 28 | 3 | 48 | 14 | 32 | 4 | 7 | 7 | 8 |
Hours | 3.62 | 1.84 | 1.38 | 3.39 | 1.30 | 0.28 | 2.25 | 3.04 | 0.75 | 9.00 | 7.51 | 1.71 |
Node*hrs | 3.62 | 1.84 | 0.46 | 1.70 | 0.16 | 0.28 | 0.75 | 3.04 | 0.09 | 1.50 | 1.25 | 0.29 |
- All estimates are approximate for whole exome and light-skim whole genome (~10-20 Gbp of data) sequenced on Illumina HiSeq and processed with the most recent versions of RTA and Casava. Nodes are 8-core, 48Â GB RAM, with ~3Â GHz Intel CPUs and ~1Â TB of local scratch disk. Steps include all aspects of the pipeline from building reads and qualities (fastQ) from raw data (bcl files), through alignment and BAM generation using the BWA aligner, and BAM finishing with GATK post-processing and duplicate marking, capture and coverage metric calculation, and BAM file validation, finally producing variants from the Atlas2 variant calling suite with annotations from our annotator, Cassandra.