Slurm GPU Allocation for Distributed Training
Complete guide to GPU allocation on Slurm — --gres flags, CUDA_VISIBLE_DEVICES remapping, GPU topology and NVLink binding, MIG partitioning, production job scripts, and debugging common GPU errors.
Clear explanations of core machine learning concepts, from foundational ideas to advanced techniques. Understand attention mechanisms, transformers, skip connections, and more.
Complete guide to GPU allocation on Slurm — --gres flags, CUDA_VISIBLE_DEVICES remapping, GPU topology and NVLink binding, MIG partitioning, production job scripts, and debugging common GPU errors.
Explore Linux filesystems through interactive visuals. Learn VFS, compare ext4 vs Btrfs vs ZFS, and understand file operations.
How the silhouette score measures clustering quality for every individual point — comparing intra-cluster cohesion to nearest-cluster separation, with per-point diagnostics that work for arbitrary cluster shapes.
Deep dive into CPython memory management: PyMalloc arenas, object pools, reference counting, and optimization techniques like __slots__ and generators.
Complete guide to C++ symbol resolution — how linkers match references to definitions, name mangling, strong vs weak symbols, ODR, template instantiation, linking order, and debugging undefined reference errors.
NVIDIA Unified Virtual Memory (UVM): on-demand page migration, memory oversubscription, and simplified CPU-GPU memory management.