Tagged with

cuda

Explore technical articles related to cuda. Find in-depth analysis, tutorials, and insights.

Articles Found

Articles Related to cuda

Best Resources for Learning CUDA Matrix Multiplication Optimization

June 3, 2026

An honest roundup of the resources I actually recommend for learning CUDA matrix multiplication optimization — from naive kernels to near-cuBLAS. Compares siboehm, Lei Mao, Salykova, NVIDIA docs, and one of my own deep dives, with explicit guidance on which to read first.

cuda gpu-computing matrix-multiplication optimization sgemm gemm cublas cutlass roundup resources

June 3, 2026

CUDA Matrix Multiplication Optimization: From Naive to Near-cuBLAS

April 7, 2026

Step-by-step CUDA matrix multiplication optimization with 9 interactive visualizations. From naive kernels through shared memory tiling to near-cuBLAS speeds.

cuda gpu-computing matrix-multiplication optimization shared-memory memory-coalescing tiling performance deep-learning hpc

April 7, 2026

The Complete NVIDIA Xid Error Field Guide

March 13, 2026

The definitive reference for every NVIDIA Xid error code: what each means, severity classification, triage flowcharts, and whether you need to fix your code or RMA your GPU.

nvidia gpu xid-errors debugging cuda gpu-monitoring ecc hardware-diagnostics production troubleshooting pcie gpu-architecture memory-errors thermal-management

March 13, 2026

Xid 31 MMU Faults: What Causes Them and How to Fix Production GPU Crashes

March 12, 2026

Deep dive into NVIDIA GPU Xid 31 MMU faults: how GPU virtual memory works, what causes page table walk failures, and how we eliminated 28 daily crashes in a production video pipeline processing 7,000+ videos.

nvidia gpu cuda mmu virtual-memory debugging video-processing xid-errors page-table gpu-architecture pytorch cupy tensorrt nvdec dlpack

March 12, 2026

How TensorRT Works: NVIDIA Inference Optimization

January 8, 2025

Explore TensorRT optimization: layer fusion, INT8 quantization, kernel auto-tuning, and deployment strategies with 8+ interactive visualizations.

TensorRT GPU Optimization Deep Learning Inference NVIDIA CUDA Performance Deployment

January 8, 2025

Kernel Fusion in Deep Learning: How GPU Kernels Are Merged

December 12, 2024

Kernel fusion merges multiple neural network operations into a single GPU kernel to eliminate intermediate memory writes — this article explains how fusion works, why it helps deep learning workloads, and how TensorRT and torch.compile use it.

kernel fusion neural networks performance deep learning machine learning cuda gpu optimization

December 12, 2024