GeDi: applying suffix arrays to increase the repertoire of detectable SNVs in tumour genomes

BMC Bioinformatics

Table 4 Runtime and memory (rss) evaluation of gedi, smufin and mutect

Dataset	Size (Million reads / GB)	RSS (GB)			Runtime (hours:minutes)
		GeDi (no emfilter)	MuTect	SMuFin (256 thread)	GeDi (no emfilter)	MuTect	SMuFin (256 thread)
SMuFin:chr22	26 / 4.40	16 (19)	67	22 (107)	0:20 (0:30)	1:33	17:48 (3:47)
TSD:chr17	11 / 2.43	7	67	-	0:06	0:50	-
TSD:chr22	4 / 0.87	3	108	-	0:02	0:38	-
MB:L.A	2052 / 69.21	1017	97*	-	71:39	1800*	-

no emfilter (bracketed values for gedi) shows gedi’s resource requirements when emfilter is off, for all other gedi runs, emfilter is on. 256 thread (bracketed values for smufin) shows smufin’s resource requirements when run with suggested command at http://cg.bsc.es/smufin/, whilst values without brackets show SMuFin’s requirements when run with 32 logical threads. GeDi and MuTect were always run with 32 logical threads, apart from analysis of MB:L.A where 64 logical threads were used for both callers. A full description of the methods used to perform this benchmark are provided in Additional file 1: Method 4. All analyses were performed on the same computing system with Xeon: E5-4650v2 CPUs. Runtime and RSS was recorded using GNU Time 1.7 (https://www.gnu.org/), where runtime is the elapsed wall clock time and RSS is the maximum residency set size. Asterisk values: MuTect took 72 hours (maximum user runtime) on our system to analyse 4% of the human genome (percentage along genome is given in MuTect output). Accordingly, assuming uniform coverage of MB:L.A data across the genome, we estimated MuTect’s runtime in hours for analysis of the complete MB:L.A dataset by multiplying 72 by 25

ISSN: 1471-2105