BMC Bioinformatics

Table 4 Performance comparison of static and runtime compilation for P7Viterbi kernel

From: CUDAMPF: a multi-tiered parallel framework for accelerating protein sequence search in HMMER on CUDA-enabled GPU

	GCUPS		L1		Read-only		Register
Model length	static	nvrtc	static	nvrtc	static	nvrtc	static	nvrtc
200	9.7	8.7	99.9 %	43.4 %	99.1 %	99.1 %	62/64	spill^a
600	12.3	4.9	55.9 %	3.9 %	85.7 %	86.3 %	62/64	spill
1001	10.4	4	52.4 %	2.9 %	74.5 %	75.2 %	62/64	spill
1400	9.6	3.3	50.3 %	1.8 %	68.6 %	69.8 %	62/64	spill
2050	9.3	2.9	49.4 %	0.5 %	62.7 %	62.6 %	62/64	spill
2405	10	2.9	49.5 %	0.4 %	61 %	61.4 %	62/64	spill

^aAssigned private registers are exhausted. Registers spill to local memory

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com