Fig. 5From: rapidGSEA: Speeding up gene set enrichment analysis on multi-core CPUs and CUDA-enabled GPUsFine-Grained Parallelization of Stage 3. Parallelization of the enrichment score computation operating on the ranked genes and precomputed bit masks. Again, each thread block processes a permutation. The threads within a thread block independently accumulate the running sum statistic for each of the probed gene sets. Shared memory is utilized to suppress redundant reads from global memoryBack to article page