Skip to main content

Table 4 Performance comparison of static and runtime compilation for P7Viterbi kernel

From: CUDAMPF: a multi-tiered parallel framework for accelerating protein sequence search in HMMER on CUDA-enabled GPU

 

GCUPS

L1

Read-only

Register

Model length

static

nvrtc

static

nvrtc

static

nvrtc

static

nvrtc

200

9.7

8.7

99.9 %

43.4 %

99.1 %

99.1 %

62/64

spilla

600

12.3

4.9

55.9 %

3.9 %

85.7 %

86.3 %

62/64

spill

1001

10.4

4

52.4 %

2.9 %

74.5 %

75.2 %

62/64

spill

1400

9.6

3.3

50.3 %

1.8 %

68.6 %

69.8 %

62/64

spill

2050

9.3

2.9

49.4 %

0.5 %

62.7 %

62.6 %

62/64

spill

2405

10

2.9

49.5 %

0.4 %

61 %

61.4 %

62/64

spill

  1. aAssigned private registers are exhausted. Registers spill to local memory