Block-based calculation of the PaPaRa scoring matrix. The dynamic programming matrix is calculated in blocks, starting from the left-most square block and proceeding toward the right-most square block. In every block the rows are calculated one after the other. Block 1 stops when the last row of the matrix is reached. It operates on global memory due to its size. The main part of the reference is processed in rectangular blocks (Blocks 2 and 3). These blocks are of significantly smaller size and operate on shared memory. Block 4 processes the last part of the reference sequence in global memory. Blocks 1 and 4, which operate on global memory, represent an engineering trade-off for avoiding if-else conditional statements in the GPU kernel that would slow down Blocks 2 and 3. To process real-world references, many more than 2 iterations over the rectangular blocks are required.