CUDAMPF: a multi-tiered parallel framework for accelerating protein sequence search in HMMER on CUDA-enabled GPU

BMC Bioinformatics

Table 3 Performance comparison of static and runtime compilation for MSV/SSV kernel

	GCUPS		L1(%)		Read-only(%)		Register(64)^b
Model length	static	nvrtc	static	nvrtc	static	nvrtc	static	nvrtc
200	70/81	79/113	100/100	bypass/bypass^a	100/100	100/100	42/33	32/30
600	118/146	165/235	100/100	bypass/bypass	98.3/98.3	98.1/98.1	42/33	44/30
1001	138/162	209/317	100/100	bypass/bypass	82.9/82.9	83/82.9	42/33	44/44
1400	146/165	239/367	99.9/100	bypass/bypass	74.3/74.3	74.2/74.2	42/33	45/44
2050	139/134	261/392	62.7/62.7	bypass/bypass	65.5/65.6	65.5/65.6	42/33	62/63
2405	139/138	277/440	58.3/58.7	bypass/bypass	63.6/63.5	63.7/63.9	42/33	63/62

^aSufficient private registers for each thread and no demand of local memory access
^bThe maximum number of available registers per thread

ISSN: 1471-2105