Skip to main content

Table 3 Performance comparison of static and runtime compilation for MSV/SSV kernel

From: CUDAMPF: a multi-tiered parallel framework for accelerating protein sequence search in HMMER on CUDA-enabled GPU

  GCUPS L1(%) Read-only(%) Register(64)b
Model length static nvrtc static nvrtc static nvrtc static nvrtc
200 70/81 79/113 100/100 bypass/bypassa 100/100 100/100 42/33 32/30
600 118/146 165/235 100/100 bypass/bypass 98.3/98.3 98.1/98.1 42/33 44/30
1001 138/162 209/317 100/100 bypass/bypass 82.9/82.9 83/82.9 42/33 44/44
1400 146/165 239/367 99.9/100 bypass/bypass 74.3/74.3 74.2/74.2 42/33 45/44
2050 139/134 261/392 62.7/62.7 bypass/bypass 65.5/65.6 65.5/65.6 42/33 62/63
2405 139/138 277/440 58.3/58.7 bypass/bypass 63.6/63.5 63.7/63.9 42/33 63/62
  1. aSufficient private registers for each thread and no demand of local memory access
  2. bThe maximum number of available registers per thread