Skip to main content
Fig. 2 | BMC Bioinformatics

Fig. 2

From: LASSIE: simulating large-scale models of biochemical systems on GPUs

Fig. 2

Threads and memory hierarchy of CUDA’s architecture. Left side. Thread organization: the host (CPU) launches a single kernel that is executed in multiple threads on the device (GPU). Threads (red cubes) are organized in three-dimensional structures called blocks (yellow cubes), which belong to three-dimensional grid (green cube). The programmer must explicitly define the dimensions of blocks and grids. Whenever a kernel is run by the host, the corresponding grid is created by the device which automatically schedules each block on one free streaming multiprocessor available. This solution allows a transparent scaling of performances on different devices. Moreover, if the machine is equipped with more than one GPU, it is also possible to distribute the workload by launching the kernel on each GPU. Right side. Memory hierarchy: in CUDA there are many different memories with different scopes. Each thread has two different kind of private memory: registers and local memories. Threads belonging to the same block can communicate through the shared memory, which has low access latency. The global memory suffers from high access latencies but it is accessible to all threads and it is cached since the introduction of the Fermi architecture. Also the texture and the constant memory are equipped with a cache as well, and all threads can read from these two memories

Back to article page