Best Resources for Learning CUDA Matrix Multiplication Optimization
An honest roundup of the resources I actually recommend for learning CUDA matrix multiplication optimization — from naive kernels to near-cuBLAS. Compares siboehm, Lei Mao, Salykova, NVIDIA docs, and one of my own deep dives, with explicit guidance on which to read first.
