GPU Memory Hierarchy & Optimization
Master GPU memory hierarchy from registers to global memory, understand coalescing patterns, bank conflicts, and optimization strategies for maximum performance
8 min readConcept
Explore machine learning concepts related to parallel-computing. Clear explanations and practical insights.
Master GPU memory hierarchy from registers to global memory, understand coalescing patterns, bank conflicts, and optimization strategies for maximum performance
Mastering HPC performance — Amdahl's Law, Gustafson's Law, strong vs weak scaling, roofline model, communication-computation overlap, load balancing, and profiling with Nsight and VTune.
Deep dive into the fundamental processing unit of modern GPUs - the Streaming Multiprocessor architecture, execution model, and memory hierarchy