Skip to main content
Fig. 1 | BMC Bioinformatics

Fig. 1

From: DECA: scalable XHMM exome copy-number variant calling with ADAM and Apache Spark

Fig. 1

DECA parallelization and performance. a DECA parallelization (shown by dashed outline) and data flow. The normalization and discovery steps are parallelized by sample (rows of the samples (s) × targets(t) read-depth matrix). The inputs and outputs of the different components are shown with thinner arrows. b DECA and XHMM execution time starting from the read-depth matrix for s = 2535 on both the workstation and on-premises Hadoop cluster for different numbers of executor cores. Mod. XHMM is a customized XHMM implementation that partitions the discovery input files and invokes XHMM in parallel. c DECA execution time for coverage and CNV discovery for different numbers of samples using the entire workstation (16 cores) and cluster (approximately 640 executor cores dynamically allocated by Spark)

Back to article page