Skip to main content

Table 4 Performance comparison of static and runtime compilation for P7Viterbi kernel

From: CUDAMPF: a multi-tiered parallel framework for accelerating protein sequence search in HMMER on CUDA-enabled GPU

  GCUPS L1 Read-only Register
Model length static nvrtc static nvrtc static nvrtc static nvrtc
200 9.7 8.7 99.9 % 43.4 % 99.1 % 99.1 % 62/64 spilla
600 12.3 4.9 55.9 % 3.9 % 85.7 % 86.3 % 62/64 spill
1001 10.4 4 52.4 % 2.9 % 74.5 % 75.2 % 62/64 spill
1400 9.6 3.3 50.3 % 1.8 % 68.6 % 69.8 % 62/64 spill
2050 9.3 2.9 49.4 % 0.5 % 62.7 % 62.6 % 62/64 spill
2405 10 2.9 49.5 % 0.4 % 61 % 61.4 % 62/64 spill
  1. aAssigned private registers are exhausted. Registers spill to local memory