GPU Memory Hierarchy & Optimization
Master GPU memory hierarchy from registers to global memory, understand coalescing patterns, bank conflicts, and optimization strategies for maximum performance
Explore machine learning concepts related to cuda. Clear explanations and practical insights.
Master GPU memory hierarchy from registers to global memory, understand coalescing patterns, bank conflicts, and optimization strategies for maximum performance
NVIDIA Unified Virtual Memory (UVM): on-demand page migration, memory oversubscription, and simplified CPU-GPU memory management.
Understanding character devices, major/minor numbers, and the device file hierarchy created by NVIDIA drivers for GPU access in Linux.
Learn CUDA Multi-Process Service (MPS) for GPU sharing. Enable concurrent kernel execution from multiple processes and maximize GPU utilization.
Understanding PyTorch pin_memory for faster CPU to GPU data transfers using DMA (Direct Memory Access) and page-locked memory.
NVIDIA Tensor Cores explained: mixed-precision matrix operations delivering 10x speedups for AI training and inference on CUDA GPUs.
Deep dive into the fundamental processing unit of modern GPUs - the Streaming Multiprocessor architecture, execution model, and memory hierarchy