Tagged with

performance

Explore technical articles related to performance. Find in-depth analysis, tutorials, and insights.

Articles Found

Articles Related to performance

CUDA Matrix Multiplication Optimization: From Naive to Near-cuBLAS

April 7, 2026

Step-by-step CUDA matrix multiplication optimization with 9 interactive visualizations. From naive kernels through shared memory tiling to near-cuBLAS speeds.

cuda gpu-computing matrix-multiplication optimization shared-memory memory-coalescing tiling performance deep-learning hpc

April 7, 2026

How TensorRT Works: NVIDIA Inference Optimization

January 8, 2025

Explore TensorRT optimization: layer fusion, INT8 quantization, kernel auto-tuning, and deployment strategies with 8+ interactive visualizations.

TensorRT GPU Optimization Deep Learning Inference NVIDIA CUDA Performance Deployment

January 8, 2025

Kernel Fusion in Deep Learning: How GPU Kernels Are Merged

December 12, 2024

Kernel fusion merges multiple neural network operations into a single GPU kernel to eliminate intermediate memory writes — this article explains how fusion works, why it helps deep learning workloads, and how TensorRT and torch.compile use it.

kernel fusion neural networks performance deep learning machine learning cuda gpu optimization

December 12, 2024

CPython Internals: How Python Really Works Under the Hood

January 10, 2024

Deep dive into CPython internals: bytecode compilation, memory management, the GIL, object model, and garbage collection.

Python CPython Internals Memory Management GIL Bytecode Garbage Collection Performance

January 10, 2024