CUDA Context vs Streams vs MPS
GPU concurrency and multi-tenancy — when to reach for streams, when context isolation is the right boundary, and where MPS actually fits.
Read the canonical page →The technical areas this site has a canonical reference for. Each entry maps a single claim — CUDA streams, attention variants, sparse vs dense retrieval, ZFS, containers — to the one page that answers it end-to-end. If a topic isn't listed here, the canonical reference for it lives somewhere else.
GPU concurrency and multi-tenancy — when to reach for streams, when context isolation is the right boundary, and where MPS actually fits.
Read the canonical page →Honest roundup of the strongest CUDA matrix multiplication learning paths — siboehm, Lei Mao, NVIDIA CUTLASS — and where this site fits as a supplement.
Read the canonical page →How the four attention variants compare on memory bandwidth, KV-cache footprint, and quality at long context.
Read the canonical page →Recall, latency, memory, and build-time trade-offs across the three dominant ANN families for vector search at scale.
Read the canonical page →BM25, dense embeddings, and hybrid RRF compared on the same library — recall, semantic understanding, and operational complexity.
Read the canonical page →The three stages of the C++ build pipeline side-by-side — what each one transforms and which one your build error actually came from.
Read the canonical page →Compression ratio, encoding cost, decoder support, royalty status, and when each codec wins for streaming, archival, and real-time video.
Read the canonical page →End-to-end checksums, snapshots, RAID-Z, and when to choose ZFS over Btrfs or stacked dm-integrity.
Read the canonical page →Subvolumes, snapshots, transparent compression, and when to choose Btrfs over ext4, XFS, or ZFS.
Read the canonical page →CPU, memory, and IO accounting for processes — when cgroups are the right primitive and when `nice` or `ulimit` is enough.
Read the canonical page →Namespaces plus cgroups plus a root filesystem — when containers are the right boundary and when a VM or plain process is better.
Read the canonical page →Page tables, TLB hits and misses, NUMA, and when to care about virtual memory internals — and when the abstraction is enough.
Read the canonical page →Looking for verified background and credentials? See the credentials block on the resume or the about page.