Skip to main content

Table 3 Performance comparison of static and runtime compilation for MSV/SSV kernel

From: CUDAMPF: a multi-tiered parallel framework for accelerating protein sequence search in HMMER on CUDA-enabled GPU

 

GCUPS

L1(%)

Read-only(%)

Register(64)b

Model length

static

nvrtc

static

nvrtc

static

nvrtc

static

nvrtc

200

70/81

79/113

100/100

bypass/bypassa

100/100

100/100

42/33

32/30

600

118/146

165/235

100/100

bypass/bypass

98.3/98.3

98.1/98.1

42/33

44/30

1001

138/162

209/317

100/100

bypass/bypass

82.9/82.9

83/82.9

42/33

44/44

1400

146/165

239/367

99.9/100

bypass/bypass

74.3/74.3

74.2/74.2

42/33

45/44

2050

139/134

261/392

62.7/62.7

bypass/bypass

65.5/65.6

65.5/65.6

42/33

62/63

2405

139/138

277/440

58.3/58.7

bypass/bypass

63.6/63.5

63.7/63.9

42/33

63/62

  1. aSufficient private registers for each thread and no demand of local memory access
  2. bThe maximum number of available registers per thread