Skip to main content
Fig. 4 | BMC Bioinformatics

Fig. 4

From: rapidGSEA: Speeding up gene set enrichment analysis on multi-core CPUs and CUDA-enabled GPUs

Fig. 4

Fine-Grained Parallelization of Stages 1 and 2. Parallelization of the deviation score computation operating on the transposed data matrix D T. Each thread block draws a permutation by shuffling the original phenotype label list in shared memory. The threads within a thread block independently accumulate gene transcription differences for each gene symbol identifier (along columns) ensuring coalesced reads from global memory. Finally, the local deviation scores are sorted using the segmented radix sort primitive of CUB

Back to article page