Figure 3From: NMF-mGPU: non-negative matrix factorization on multi-GPU systems Update rule for matrix H. Matrix V is blockwise transferred, while W and H are fully loaded into the GPU memory at algorithm start. Nevertheless, both V and H are processed in portions of size n × b m and k × b m , respectively (b m ≤ m). Circled operations denote CUDA Kernels. Symbols “ .* ” and “ ./ ” denote pointwise matrix operations. Updated columns from H are marked with a big down arrow. Finally, the squared region at bottom left represents the reduction and accumulation of updated columns into a single k-length vector required by the next update-W rule.Back to article page